2026-06-21 期

本期机器人顶刊精选

本期检索 2026-03-23 至 2026-06-21 期间上线的 663 篇机器人领域文献,覆盖 Science Robotics、IEEE T-RO、IJRR、RA-L、Autonomous Robots 与 Journal of Field Robotics。整体来看,「机器人基础模型 / 大模型」热度持续走高——从四旋翼基础策略 RAPTOR、到对大行为模型(LBM)灵巧操作的严谨实证、再到面向集群与多机器人的基础模型展望;具身硬件同样亮眼:3.6 米弹跳机器人、电流体纤维人工肌肉、登月可变形漫游车 SORA-Q 与微创脊柱手术机器人。以下为编辑精选 8 篇,以及 11 个研究方向各自的重点推荐。

共 663 篇RA-L · 465JFR · 64T-RO · 58Sci. Robotics · 32IJRR · 26AuRo · 18

Editor's Picks编辑精选

1
Sci. Robotics 2026-06-10

Precise aggressive aerial maneuvers with sensorimotor policies

Tianyue Wu, Guangtong Xu, Zihan Wang, Junxiao Lin, Tianyang Chen, Yuze Wu, Zhichao Han, Zhiyang Liu, et al.

仅凭机载轻量传感器,端到端感知-运动策略让四旋翼以 SE(3) 倾斜姿态高速穿越狭窄缝隙,把激进机动从依赖外部动捕推进到完全自主。

看点把「极限敏捷飞行」从实验室动捕条件搬到真实机载传感,是无人机自主性的标志性进展。

无人机 / 空中机器人导航 / SLAM / 自动驾驶机器人学习感知与传感
摘要 Abstract

Precise aggressive maneuvers with lightweight onboard sensors remain a key bottleneck in fully exploiting the maneuverability of drones. Such maneuvers are critical for expanding the systems’ accessible area by navigating through narrow openings in the environment. One of the most relevant problems is aggressive traversal through narrow gaps with quadrotors under constraints in the special Euclidean group of three dimensions [ SE ( 3 ) ], which requires the quadrotors to leverage a momentary tilted attitude and the asymmetry of the airframes to navigate through gaps. Here, we achieved such maneuvers by developing sensorimotor policies directly mapping onboard vision and proprioception into low-level control commands. The policies were trained using reinforcement learning (RL) with end-to-end policy distillation in simulation. We mitigated the model-free RL’s exploration challenge on the restricted solution space with an initialization strategy leveraging trajectories generated by a model-based planner. Careful sim-to-real design allowed the policy to control a quadrotor through narrow gaps with low clearances and high repeatability. For instance, the proposed method enabled a quadrotor to navigate a rectangular gap at a 5-centimeter clearance, tilted at an orientation up to 90°, without knowledge of the gap’s position or orientation. Without training on dynamic gaps, the policy could reactively servo the quadrotor to traverse through a moving gap. The proposed method was validated on challenging tracks of narrow, closely placed gaps. The flexibility of the policy learning method was demonstrated by developing policies on geometrically diverse gaps without relying on manually defined traversal poses and visual features.

2
Sci. Robotics 2026-05-13

RAPTOR: A foundation policy for quadrotor control

Jonas Eschmann, Dario Albani, Giuseppe Loianno

RAPTOR 训练出单一四旋翼「基础控制策略」,无需重新系统辨识即可零样本适配从未见过的机体与扰动,直击 RL 策略过拟合单一环境的痛点。

看点用「基础模型」思路解决控制策略的泛化与 sim-to-real,是本季最值得关注的范式之一。

无人机 / 空中机器人机器人学习
摘要 Abstract

Humans are remarkably data efficient when adapting to previously unseen conditions, like driving a new car. In contrast, modern robotic control systems, like neural network policies trained using reinforcement learning (RL), are highly specialized for single environments. Because of this overfitting, they are known to break down even under small differences like the simulation-to-reality gap and require system identification and retraining for even minimal changes to the system. Here, we present RAPTOR, a method for training a highly adaptive foundation policy for quadrotor control. Our method enables training a single, end-to-end neural network policy to control a wide variety of quadrotors. We tested 10 different real quadrotors, from 32 grams to 2.4 kilograms, that also differed in motor type (brushed versus brushless), frame type (soft versus rigid), propeller type (two, three, or four blades), and flight controller (PX4, Betaflight, Crazyflie, M5StampFly). We found that a tiny, three-layer policy with only 2084 parameters was sufficient for zero-shot adaptation to a wide variety of platforms. The adaptation through in-context learning was made possible by using a recurrence in the hidden layer. The policy was trained through our proposed meta-imitation learning algorithm, where we sampled 1000 quadrotors and trained a teacher policy for each of them using RL. The 1000 teachers were distilled into a single, adaptive student policy. We found that within milliseconds, the resulting foundation policy adapted zero-shot to unseen quadrotors. We tested the capabilities of the foundation policy under numerous conditions (trajectory tracking, indoor/outdoor, wind disturbance, poking, and different propellers).

3
Sci. Robotics 2026-04-15

A careful examination of large behavior models for multitask dexterous manipulation

Jose Barreiros, Andrew Beaulieu, Aditya Bhat, Rick Cory, Eric Cousineau, Hongkai Dai, Ching-Hsin Fang, Kunimatsu Hashimoto, et al.

对多任务灵巧操作的大行为模型(LBM)做了一次罕见严谨的真机评测,量化其真实能力与边界,为「通用机器人基础模型」的炒作降温、为评测立标。

看点在一片乐观叙事中提供了扎实可复现的实证基准,研究者必读。

操作与机械臂机器人学习感知与传感
摘要 Abstract

Robot manipulation has seen tremendous progress in recent years, with imitation learning policies enabling successful performance of dexterous and hard-to-model tasks. Concurrently, scaling data and model size has led to the development of capable language and vision foundation models, motivating large-scale efforts to create general-purpose robot foundation models. Although these models have garnered considerable enthusiasm and investment, meaningful evaluation of real-world performance remains a challenge, limiting the pace of development and inhibiting a nuanced understanding of current capabilities. Here, we rigorously evaluated multitask robot manipulation policies, referred to as large behavior models, by extending the diffusion policy paradigm across a corpus of simulated and real-world robot data. We proposed and validated an evaluation pipeline to rigorously analyze the capabilities of these models with statistical confidence. We compared against single-task baselines through blind, randomized trials in a controlled setting, using both simulation and real-world experiments. We found that multitask pretraining made the policies more successful and robust and enabled teaching complex new tasks more quickly, using a fraction of the data when compared with single-task baselines. Moreover, performance predictably increased as pretraining scale and diversity grows.

4
T-RO 2026-06-08

Parallel-Elastic Actuation with Reactive Latch Elevates Robotic Hopping Performance: Jump Height and Continuity

Songnan Bai, Runze Ding, Song Li, Ruihan Jia, Ruobing Wang, Zhiyuan Zhang, Fangzheng Wang, Pakpong Chirarattananon

并联弹性驱动 + 反应式闩锁机构优化能量收放,使腿式机器人最高弹跳达 3.6 米,连续性与高度同时刷新人类与动物纪录。

看点机构创新带来数量级性能跃升,给足式/弹跳机器人树立了新标杆。

无人机 / 空中机器人足式 / 四足机器人控制与动力学
摘要 Abstract

While many animals exhibit impressive hopping capabilities, machines have struggled to match their performance. Current hopping robots face limitations in power density, energy efficiency, and control stability. Here, we present a parallel-elastic actuation mechanism with a reactive latch that optimizes energy transfer, enabling a legged robot to achieve hopping heights and continuity previously unattainable. This mechanism efficiently stores and releases energy, extending the actuation period over the aerial phase while minimizing stance time. Our robot achieves a maximum hopping height of 3.6 meters, surpassing both human and animal records while demonstrating sustained, high-frequency hopping cycles with minimal power requirement. By integrating inertia-based onboard sensorimotor autonomy, we demonstrate stable, controlled hopping in environments without external aid. These results represent a step toward bridging the performance gap between biological and robotic locomotion, with potential to influence the design of future legged systems.

5
Sci. Robotics 2026-06-10

From ball to rover: Transformable palm-sized rover SORA-Q for autonomous lunar exploration

D. Hirano, M. Inazawa, M. Sutoh, M. Nagata, Y. Yoneda, K. Watanabe, H. Sawada, G. Sakoda, et al.

掌心大小、可由球形变形为两轮漫游的 SORA-Q,在严苛载荷与算力约束下实现自主月面探测,是真正飞向月球的微型机器人。

看点从工程约束到真实任务落地的完整闭环,行星探测机器人的代表作。

足式 / 四足机器人导航 / SLAM / 自动驾驶人机交互 / 遥操作
摘要 Abstract

Robotic technologies are expected to drive substantial advancements in planetary exploration and resource prospecting by performing a variety of tasks in extraterrestrial environments. In particular, miniature robots are ideally suited for integration into spacecraft with strict payload limitations, providing a cost-effective solution. However, the pursuit of autonomous exploration using these miniature robots presents challenges owing to constraints in computational power and battery capacity and reduced locomotion performance owing to their small size. Here, we introduce a two-wheeled centimeter-scale rover, designated Lunar Excursion Vehicle 2 (LEV-2), also known as SORA-Q (named after the Japanese words for space and sphere), which transforms into a wheeled configuration from a compact spherical form, enabling efficient traversal of soft lunar terrains. On 19 January 2024 (universal time coordinated), LEV-2 was deployed from a Japanese lunar lander, Smart Lander for Investigating Moon (SLIM), immediately before its landing on the lunar surface. After a lunar landing, the palm-sized rover accomplished autonomous lunar exploration by navigating around the SLIM lander, capturing images of both the SLIM lander and its environment and transmitting selected images through wireless communication on the lunar surface without reliance on ground-based teleoperation. This study details the system design of LEV-2 and presents the results of its in situ lunar activities, highlighting the efficacy of the proposed technologies necessary for mission implementation. Furthermore, we discuss the technical challenges encountered during the mission, including operational constraints and partial data loss, as well as the lessons learned for future exploration missions using small-scale space robots.

6
Sci. Robotics 2026-05-20

A minimally invasive robotic spinal surgical system for anterior lumbar nerve decompression

Qingxiang Zhao, Xiandi Wang, Xin Zhong, Runfeng Zhu, Peizhi Zhou, Dan Pu, Baitao Lin, Tao Li, et al.

面向前路腰椎神经减压的微创手术机器人系统,以更高远端灵活度与可视性弥补传统前路术式视野受限、减压不彻底的不足。

看点把临床痛点与机器人灵巧设计紧密结合,医疗机器人转化的优秀样本。

操作与机械臂导航 / SLAM / 自动驾驶机器人学习医疗 / 软体 / 微纳
摘要 Abstract

Lumbar degenerative diseases, primarily caused by pathological tissues compressing spinal nerves, typically necessitate surgical intervention—specifically lumbar nerve decompression—to alleviate pain. Although the anterior decompression approach demonstrates notable advantages, such as reduced bleeding and shorter postoperative hospitalization stays, compared with the conventional posterior approach, patients may still experience incomplete decompression because of various instrumental shortcomings, including restricted visibility and insufficiency of distal dexterity. In this study, we present a robotic surgical system for minimally invasive anterior lumbar nerve decompression, which comprises three slender robotic arms (2 millimeters in outer diameter) with high dexterity (18 degrees of freedom), facilitating effective navigation through the narrow intervertebral disc space to reach the posterior area. Each robot arm is based on concentric push-pull robot structure, forming three robotized instruments: an endoscope for visualization, a laser optical fiber for hemostasis and resection, and a gripper for tissue manipulation. These components are integrated through the hollow lumen of a slender trocar, and multi-instrument coordination enables effective decompression procedure with wide view. System performance was first validated using a three-dimensional–printed vertebral phantom model to confirm accessibility to bilateral articular processes. Subsequently, in vivo animal experiment and human cadaver tests were conducted to further demonstrate the full capabilities in performing minimally invasive lumbar nerve decompression. This study demonstrates the potential of the robotic system to facilitate surgical procedures in narrow, confined, and tortuous anatomical spaces, addressing the key limitations of conventional instruments in anterior lumbar nerve decompression.

7
Sci. Robotics 2026-05-20

Extreme dynamic symmetry enables omnidirectional and multifunctional robots

Jiaxun Liu, Boxi Xia, Boyuan Chen

提出「动态对称性 / 动态各向同性」设计原则:质心可达加速度越均匀,机器人在 1000+ 仿真形态中的轨迹跟踪、鲁棒性与能效越好。

看点把对称性从几何外形提升为「动力学能力」的统一设计语言,思想性强。

足式 / 四足机器人感知与传感控制与动力学
摘要 Abstract

Symmetry is a central organizing principle in natural systems, yet its use as a unifying design strategy in robotics has largely remained limited to geometric form. We show that symmetry can instead be leveraged at the level of dynamic actuation capability. We introduce dynamic symmetry, the uniformity of a robot’s attainable center-of-mass accelerations, and formalize it through a measure coined as dynamic isotropy. Across more than 1000 simulated morphologies, we found that higher dynamic symmetry consistently improved trajectory tracking, task success, robustness, resiliency, and energy efficiency, with the benefits becoming most pronounced as dynamic isotropy approached its theoretical limit. To study this regime systematically, we developed Argus, a family of spherical robots designed to explore the effects of increasing dynamic symmetry. Members of the Argus family vary in their actuation geometry and dynamic symmetry level while sharing a common architectural principle: radially oriented linear actuators that directly shape the robot’s center-of-mass dynamics. Among them, we built a physical 20-leg Argus variant that achieved near-extreme dynamic isotropy and demonstrated orientation-invariant locomotion, agile traversal of cluttered and deformable terrain, rapid self-stabilization, and resilience to partial actuator failures. Its distributed sensing further enabled omnidirectional perception and object interaction during continuous motion. These results show that designing robots for symmetry not only in morphology but also in their attainable dynamics provides a powerful and general pathway toward agility, robustness, and multifunctionality in uncertain terrestrial and extraterrestrial environments.

8
Sci. Robotics 2026-03-25 · 被引 1

Electrofluidic fiber muscles

O. K. Afsar, G. Pupillo, G. Vitucci, W. Babatain, H. Ishii, V. Cacucciolo

电流体纤维人工肌肉具备与骨骼肌相当的功率密度(50 W/kg)、20% 收缩率与 0.3 s 响应,且纤维形态可模块化、密集集成。

看点软体驱动长期受限于功率密度与集成度,这项工作直指核心瓶颈。

控制与动力学
摘要 Abstract

Actuators are to robots what muscles are to humans. They enable motion and determine strength and dexterity. The fiber form factor makes skeletal muscles modular, scalable, and densely integrated (50% of human body weight). In contrast, servo motors that drive today’s robots lack the flexibility and modularity of muscle fibers, limiting integration and dexterity. Here, we report electrofluidic fiber muscles, soft artificial muscles for robotic applications with power density comparable to skeletal muscles (50 watts per kilogram), contraction strains of 20%, and response time of 0.3 second. These 2-millimeter-thick muscles comprise antagonistic fluidic actuators driven by electrohydrodynamic fiber pumps in a closed circuit. They require no external liquid reservoir and are electrically driven, untethered, and silent. We demonstrated that performance is increased by pre-pressurizing the muscles at an optimal bias pressure. Applying bias pressure allowed the antagonist actuator to act as a reservoir for the agonist, enabled 200% higher operating voltages by preventing cavitation, and leveraged the nonlinear pressure-stroke response of the actuators, increasing strain threefold at a given pump pressure. We characterized and modeled their dynamics, identifying optimal bias pressures. Electrofluidic muscles scale by simply bundling fibers. By selecting the ratio between pumps and actuators, we programmed their performance for different robotic tasks: a fast lever (180 millimeters per second) that launches objects in <0.3 second; a strong bundle that lifts 4 kilograms (200 times its weight) with a 30-millimeter stroke; a woven muscle that bends a robot arm by 40° and is compliant enough for a human handshake.

By Direction分方向重点

🛸无人机 / 空中机器人 Aerial Robots & UAVs64 篇

Sci. Robotics 2026-06-10

Precise aggressive aerial maneuvers with sensorimotor policies

Tianyue Wu, Guangtong Xu, Zihan Wang, Junxiao Lin, Tianyang Chen, Yuze Wu, Zhichao Han, Zhiyang Liu, et al.

仅凭机载轻量传感器,端到端感知-运动策略让四旋翼以 SE(3) 倾斜姿态高速穿越狭窄缝隙,把激进机动从依赖外部动捕推进到完全自主。

看点把「极限敏捷飞行」从实验室动捕条件搬到真实机载传感,是无人机自主性的标志性进展。

无人机 / 空中机器人导航 / SLAM / 自动驾驶机器人学习感知与传感
摘要 Abstract

Precise aggressive maneuvers with lightweight onboard sensors remain a key bottleneck in fully exploiting the maneuverability of drones. Such maneuvers are critical for expanding the systems’ accessible area by navigating through narrow openings in the environment. One of the most relevant problems is aggressive traversal through narrow gaps with quadrotors under constraints in the special Euclidean group of three dimensions [ SE ( 3 ) ], which requires the quadrotors to leverage a momentary tilted attitude and the asymmetry of the airframes to navigate through gaps. Here, we achieved such maneuvers by developing sensorimotor policies directly mapping onboard vision and proprioception into low-level control commands. The policies were trained using reinforcement learning (RL) with end-to-end policy distillation in simulation. We mitigated the model-free RL’s exploration challenge on the restricted solution space with an initialization strategy leveraging trajectories generated by a model-based planner. Careful sim-to-real design allowed the policy to control a quadrotor through narrow gaps with low clearances and high repeatability. For instance, the proposed method enabled a quadrotor to navigate a rectangular gap at a 5-centimeter clearance, tilted at an orientation up to 90°, without knowledge of the gap’s position or orientation. Without training on dynamic gaps, the policy could reactively servo the quadrotor to traverse through a moving gap. The proposed method was validated on challenging tracks of narrow, closely placed gaps. The flexibility of the policy learning method was demonstrated by developing policies on geometrically diverse gaps without relying on manually defined traversal poses and visual features.

Sci. Robotics 2026-05-13

RAPTOR: A foundation policy for quadrotor control

Jonas Eschmann, Dario Albani, Giuseppe Loianno

RAPTOR 训练出单一四旋翼「基础控制策略」,无需重新系统辨识即可零样本适配从未见过的机体与扰动,直击 RL 策略过拟合单一环境的痛点。

看点用「基础模型」思路解决控制策略的泛化与 sim-to-real,是本季最值得关注的范式之一。

无人机 / 空中机器人机器人学习
摘要 Abstract

Humans are remarkably data efficient when adapting to previously unseen conditions, like driving a new car. In contrast, modern robotic control systems, like neural network policies trained using reinforcement learning (RL), are highly specialized for single environments. Because of this overfitting, they are known to break down even under small differences like the simulation-to-reality gap and require system identification and retraining for even minimal changes to the system. Here, we present RAPTOR, a method for training a highly adaptive foundation policy for quadrotor control. Our method enables training a single, end-to-end neural network policy to control a wide variety of quadrotors. We tested 10 different real quadrotors, from 32 grams to 2.4 kilograms, that also differed in motor type (brushed versus brushless), frame type (soft versus rigid), propeller type (two, three, or four blades), and flight controller (PX4, Betaflight, Crazyflie, M5StampFly). We found that a tiny, three-layer policy with only 2084 parameters was sufficient for zero-shot adaptation to a wide variety of platforms. The adaptation through in-context learning was made possible by using a recurrence in the hidden layer. The policy was trained through our proposed meta-imitation learning algorithm, where we sampled 1000 quadrotors and trained a teacher policy for each of them using RL. The 1000 teachers were distilled into a single, adaptive student policy. We found that within milliseconds, the resulting foundation policy adapted zero-shot to unseen quadrotors. We tested the capabilities of the foundation policy under numerous conditions (trajectory tracking, indoor/outdoor, wind disturbance, poking, and different propellers).

Sci. Robotics 2026-03-25 · 被引 1

Milliwatt ultrasound for navigation in visually degraded environments on palm-sized aerial robots

Manoj Velmurugan, Phillip Brush, Colin Balfour, Richard J. Przybyla, Nitin J. Sanket

毫瓦级超声为掌心大小空中机器人提供在视觉退化(低光/烟尘/雾)环境下的导航能力,功耗远低于雷达,适配微型平台。

看点为微型无人机在 GPS/视觉双失效场景下的自主导航给出低功耗新解。

无人机 / 空中机器人导航 / SLAM / 自动驾驶机器人学习感知与传感
摘要 Abstract

Tiny palm-sized aerial robots have exceptional agility and cost-effectiveness in navigating confined and cluttered environments. However, their limited payload capacity directly constrains the sensing suite onboard the robot, thereby limiting critical navigational tasks in Global Positioning System (GPS)–denied wild scenes. Common methods for obstacle avoidance use cameras and light detection and ranging (LIDAR), which become ineffective under visually degraded conditions such as low visibility, dust, fog, or darkness. Other sensors, such as radio detection and ranging (RADAR), have high power consumption, making them unsuitable for tiny aerial robots. Inspired by bats, we propose Saranga, a low-power, ultrasound-based perception stack that localizes obstacles using a dual sonar array. We present two key solutions to combat the low peak signal-to-noise ratio of −4.9 decibels: physical noise reduction and a deep learning–based denoising method. First, we present a practical way to block propeller-induced ultrasound noise on the weak echoes. The second solution is to train a neural network to use the long horizon of ultrasound echoes for finding signal patterns under high amounts of uncorrelated noise where classical methods were insufficient. We generalized to the real world by using a synthetic data generation pipeline augmented with limited real noise data for training. We enabled a palm-sized aerial robot to navigate under visually degraded conditions of dense fog, darkness, and snow in a cluttered environment with thin and transparent obstacles using only onboard sensing and computation. We provide extensive real-world results to demonstrate the efficacy of our approach.

🧍人形机器人 Humanoid Robots26 篇

RA-L 2026-06-15

DPL: Depth-only Perceptive Humanoid Locomotion via Realistic Depth Synthesis and Cross-Attention Terrain Reconstruction

Jingkai Sun, Gang Han, Pihai Sun, Wen Zhao, Jiahang Cao, Jiaxu Wang, Qiang Zhang, Yijie Guo

DPL 仅用深度信息实现可感知地形的人形步态:通过真实感深度合成与交叉注意力地形重建,摆脱对高程图的依赖。

看点推进人形「感知-地形-步态」一体化,纯深度方案更易部署。

人形机器人足式 / 四足机器人导航 / SLAM / 自动驾驶机器人学习感知与传感
摘要 Abstract

Recent advancements in legged robot perceptive locomotion have shown promising progress. However, terrain-aware humanoid locomotion remains largely constrained to two paradigms: depth image-based end-to-end learning and elevation map-based methods. The former suffers from limited training efficiency and a significant sim-to-real gap in depth perception, while the latter depends heavily on multiple vision sensors and localization systems, resulting in latency and reduced robustness. To overcome these challenges, we propose a novel framework that tightly integrates three key components: (1) Terrain-Aware Locomotion Policy with a Blind Backbone, which leverages pre-trained elevation map-based perception to guide reinforcement learning with minimal visual input; (2) Multi-Modality Cross-Attention Transformer, which reconstructs structured terrain representations from noisy depth images; (3) Realistic Depth Images Synthetic Method, which employs self-occlusion-aware ray casting and noise-aware modeling to synthesize realistic depth observations, achieving over 30% reduction in terrain reconstruction error. This combination enables efficient policy training with limited data and hardware resources, while preserving critical terrain features essential for generalization. We validate our framework on a full-sized humanoid robot, demonstrating agile and adaptive locomotion across diverse and challenging terrains.

RA-L 2026-06-15

Collision-Free Humanoid Traversal in Cluttered Indoor Scenes

Han Xue, Sikai Liang, Zhikai Zhang, Zicheng Zeng, Yun Liu, Yunrui Lian, Jilong Wang, Qingtao Liu, et al.

面向杂乱室内场景的人形无碰撞穿越:跨越地面障碍、低姿钻越、侧身挤过窄道,把感知与全身运动规划耦合起来。

看点把人形机器人从「会走」推向「在真实杂乱环境里灵活通行」。

人形机器人导航 / SLAM / 自动驾驶机器人学习感知与传感人机交互 / 遥操作
摘要 Abstract

We study the problem of collision-free humanoid traversal in cluttered indoor scenes, such as hurdling over objects scattered on the floor, crouching under low-hanging obstacles, or squeezing through narrow passages. To achieve this goal, the humanoid needs to map its perception of surrounding obstacles with diverse spatial layouts and geometries to the corresponding traversal skills. However, the lack of an effective representation that captures humanoid–obstacle relationships during collision avoidance makes directly learning such mappings difficult. We therefore propose Humanoid Potential Field (HumanoidPF), which encodes these relationships as collision-free motion directions, significantly facilitating RL-based traversal skill learning. We also find that HumanoidPF exhibits a surprisingly negligible sim-to-real gap as a perceptual representation. To further enable generalizable traversal skills through diverse and challenging cluttered indoor scenes, we further propose a hybrid scene generation method, incorporating crops of realistic 3D indoor scenes and procedurally synthesized obstacles. We successfully transfer our policy to the real world and develop a teleoperation system where users could command the humanoid to traverse in cluttered indoor scenes with just a single click. Extensive experiments are conducted in both simulation and the real world to validate the effectiveness of our method.

Sci. Robotics 2026-05-13 · 被引 1

“Humanoids will soon replace most human workers”: A debate

Hesheng Wang, Michael Wang, Frank Park, Lijun Han, Huichan Zhao, XingXing Wang, Andra Keay, Shigeki Sugano, et al.

2025 IROS(浙江)现场辩论实录:人形机器人是否会很快取代大多数人类工作?正反双方观点集中呈现。

看点一线学者与产业专家的观点交锋,把握人形赛道判断的好材料。

人形机器人
摘要 Abstract

At the 2025 International Conference of Intelligent Robots and Systems (IROS) in Zhejiang, China, leading researchers and industry experts debated whether humanoid robots will soon replace human workers. A summary of the points raised by the debaters for or against the claim is highlighted below.

🐾足式 / 四足机器人 Legged & Quadruped Robots66 篇

T-RO 2026-06-08

Parallel-Elastic Actuation with Reactive Latch Elevates Robotic Hopping Performance: Jump Height and Continuity

Songnan Bai, Runze Ding, Song Li, Ruihan Jia, Ruobing Wang, Zhiyuan Zhang, Fangzheng Wang, Pakpong Chirarattananon

并联弹性驱动 + 反应式闩锁机构优化能量收放,使腿式机器人最高弹跳达 3.6 米,连续性与高度同时刷新人类与动物纪录。

看点机构创新带来数量级性能跃升,给足式/弹跳机器人树立了新标杆。

无人机 / 空中机器人足式 / 四足机器人控制与动力学
摘要 Abstract

While many animals exhibit impressive hopping capabilities, machines have struggled to match their performance. Current hopping robots face limitations in power density, energy efficiency, and control stability. Here, we present a parallel-elastic actuation mechanism with a reactive latch that optimizes energy transfer, enabling a legged robot to achieve hopping heights and continuity previously unattainable. This mechanism efficiently stores and releases energy, extending the actuation period over the aerial phase while minimizing stance time. Our robot achieves a maximum hopping height of 3.6 meters, surpassing both human and animal records while demonstrating sustained, high-frequency hopping cycles with minimal power requirement. By integrating inertia-based onboard sensorimotor autonomy, we demonstrate stable, controlled hopping in environments without external aid. These results represent a step toward bridging the performance gap between biological and robotic locomotion, with potential to influence the design of future legged systems.

Sci. Robotics 2026-05-20

Extreme dynamic symmetry enables omnidirectional and multifunctional robots

Jiaxun Liu, Boxi Xia, Boyuan Chen

提出「动态对称性 / 动态各向同性」设计原则:质心可达加速度越均匀,机器人在 1000+ 仿真形态中的轨迹跟踪、鲁棒性与能效越好。

看点把对称性从几何外形提升为「动力学能力」的统一设计语言,思想性强。

足式 / 四足机器人感知与传感控制与动力学
摘要 Abstract

Symmetry is a central organizing principle in natural systems, yet its use as a unifying design strategy in robotics has largely remained limited to geometric form. We show that symmetry can instead be leveraged at the level of dynamic actuation capability. We introduce dynamic symmetry, the uniformity of a robot’s attainable center-of-mass accelerations, and formalize it through a measure coined as dynamic isotropy. Across more than 1000 simulated morphologies, we found that higher dynamic symmetry consistently improved trajectory tracking, task success, robustness, resiliency, and energy efficiency, with the benefits becoming most pronounced as dynamic isotropy approached its theoretical limit. To study this regime systematically, we developed Argus, a family of spherical robots designed to explore the effects of increasing dynamic symmetry. Members of the Argus family vary in their actuation geometry and dynamic symmetry level while sharing a common architectural principle: radially oriented linear actuators that directly shape the robot’s center-of-mass dynamics. Among them, we built a physical 20-leg Argus variant that achieved near-extreme dynamic isotropy and demonstrated orientation-invariant locomotion, agile traversal of cluttered and deformable terrain, rapid self-stabilization, and resilience to partial actuator failures. Its distributed sensing further enabled omnidirectional perception and object interaction during continuous motion. These results show that designing robots for symmetry not only in morphology but also in their attainable dynamics provides a powerful and general pathway toward agility, robustness, and multifunctionality in uncertain terrestrial and extraterrestrial environments.

Sci. Robotics 2026-06-10

Therapist-exoskeleton-patient interaction for gait therapy

Emek Barış Küçüktabak, Matthew R. Short, Lorenzo Vianello, Daniel Ludvig, Levi Hargrove, Kevin Lynch, Jose Pons

面向卒中步态康复的「治疗师-外骨骼-患者」三方交互框架,支持多关节同步辅助、减轻治疗师负担并提供客观反馈。

看点把康复机器人从「替代人手」做成「人机协同」,临床适配性强。

足式 / 四足机器人医疗 / 软体 / 微纳人机交互 / 遥操作
摘要 Abstract

After a stroke, individuals often experience mobility impairments because of weakness and loss of independent joint control in the lower limbs. As a result, gait recovery becomes a primary goal of physical rehabilitation, traditionally achieved through high-intensity therapist-led training. However, conventional therapist-led approaches involving manual assistance or resistance can be physically demanding and limit interaction at multiple joints simultaneously. Robotic exoskeletons have emerged as a promising solution, enabling multijoint support, reducing therapist strain, and offering objective performance feedback. However, typical exoskeleton control strategies limit the physical therapist’s involvement and adaptability to the patient’s needs, which may hinder clinical adoption and outcomes. In this study, we introduce a gait rehabilitation paradigm based on physical human-robot-human interaction that we call therapist-exoskeleton-patient interaction (TEPI), in which a therapist and a patient with stroke are each equipped with a lower-limb exoskeleton virtually connected at the hips and knees via spring-damper elements. This connection enables bidirectional physical interaction, allowing the therapist to guide the patient’s movement while receiving real-time haptic feedback. We evaluated this approach with eight patients with chronic stroke using a within-subject design, comparing TEPI training with conventional therapist-guided mobilization during treadmill walking. Results showed that, compared with conventional therapy, TEPI led to greater joint range of motion, increased step length and height, similar muscle activation, and high self-reported motivation and enjoyment. These findings suggest that TEPI can integrate robotic precision with therapist intuition, offering a framework for enhancing gait rehabilitation outcomes in populations recovering from stroke.

🦾操作与机械臂 Manipulation & Grasping166 篇

Sci. Robotics 2026-04-15

A careful examination of large behavior models for multitask dexterous manipulation

Jose Barreiros, Andrew Beaulieu, Aditya Bhat, Rick Cory, Eric Cousineau, Hongkai Dai, Ching-Hsin Fang, Kunimatsu Hashimoto, et al.

对多任务灵巧操作的大行为模型(LBM)做了一次罕见严谨的真机评测,量化其真实能力与边界,为「通用机器人基础模型」的炒作降温、为评测立标。

看点在一片乐观叙事中提供了扎实可复现的实证基准,研究者必读。

操作与机械臂机器人学习感知与传感
摘要 Abstract

Robot manipulation has seen tremendous progress in recent years, with imitation learning policies enabling successful performance of dexterous and hard-to-model tasks. Concurrently, scaling data and model size has led to the development of capable language and vision foundation models, motivating large-scale efforts to create general-purpose robot foundation models. Although these models have garnered considerable enthusiasm and investment, meaningful evaluation of real-world performance remains a challenge, limiting the pace of development and inhibiting a nuanced understanding of current capabilities. Here, we rigorously evaluated multitask robot manipulation policies, referred to as large behavior models, by extending the diffusion policy paradigm across a corpus of simulated and real-world robot data. We proposed and validated an evaluation pipeline to rigorously analyze the capabilities of these models with statistical confidence. We compared against single-task baselines through blind, randomized trials in a controlled setting, using both simulation and real-world experiments. We found that multitask pretraining made the policies more successful and robust and enabled teaching complex new tasks more quickly, using a fraction of the data when compared with single-task baselines. Moreover, performance predictably increased as pretraining scale and diversity grows.

Sci. Robotics 2026-04-29

A retrieval-augmented framework enabling VLM spatial awareness for object-centric robot manipulation

Kai Chen, Chengkun Li, Chang Tu, Jiahui Pan, Yiyao Ma, Wei Chen, Zhongxiang Zhou, Xuecheng Xu, et al.

RAM 以「检索增强 + 物体中心」方式,把视觉-语言模型的语义推理接入操作所需的精确几何,弥合语义到几何的鸿沟。

看点VLM 落地操作的关键一步:让大模型「会摆放」而不仅「会描述」。

操作与机械臂机器人学习感知与传感
摘要 Abstract

Connecting the semantic reasoning of vision-language models (VLMs) to the precise geometric demands of robotic manipulation remains a fundamental challenge. Although VLMs can interpret high-level commands, they lack the intrinsic spatial intelligence required for tasks demanding precise object placement, orientation, and physical reasoning. Here, we introduce Retrieval-Augmented Manipulation (RAM), an object-centric framework that endows general-purpose vision foundation models with the spatial reasoning necessary for robust manipulation. RAM bridges the semantic-to-geometric gap by grounding abstract concepts into an explicit, object-centric three-dimensional (3D) representation. This grounded information is then provided as augmented context to the VLM, empowering it to decompose complex instructions into a sequence of spatially precise and physically plausible subgoals. We demonstrate that RAM, in a zero-shot setting on a real-world robot, can execute these subgoals to fulfill complex spatial language instructions, complete spatially aware manipulation under the guidance of a single 2D image, and adaptively replan tasks by reasoning about physical constraints like object size and collisions. Quantitative evaluations on the Common Object in 3D (CO3D) dataset also validated that RAM’s core vision module generalizes to previously unseen object categories and is robust to variations in shape and occlusions. By providing a structured bridge between semantic intent and geometric execution, RAM represents a critical step toward developing more physically intelligent and general-purpose robotic systems.

Sci. Robotics 2026-04-29

Dexterous grasping with an active palm

Amos Matsiko

带主动掌心的触觉响应式夹爪,实现自适应抓取与更高自由度的灵巧操作。

看点手掌从被动支撑变为主动自由度,重新定义夹爪的灵巧上限。

操作与机械臂感知与传感
摘要 Abstract

A tactile-responsive gripper with an active palm enables adaptive grasping and dexterous manipulation of objects.

🧭导航 / SLAM / 自动驾驶 Navigation, SLAM & Driving240 篇

Sci. Robotics 2026-06-10

From ball to rover: Transformable palm-sized rover SORA-Q for autonomous lunar exploration

D. Hirano, M. Inazawa, M. Sutoh, M. Nagata, Y. Yoneda, K. Watanabe, H. Sawada, G. Sakoda, et al.

掌心大小、可由球形变形为两轮漫游的 SORA-Q,在严苛载荷与算力约束下实现自主月面探测,是真正飞向月球的微型机器人。

看点从工程约束到真实任务落地的完整闭环,行星探测机器人的代表作。

足式 / 四足机器人导航 / SLAM / 自动驾驶人机交互 / 遥操作
摘要 Abstract

Robotic technologies are expected to drive substantial advancements in planetary exploration and resource prospecting by performing a variety of tasks in extraterrestrial environments. In particular, miniature robots are ideally suited for integration into spacecraft with strict payload limitations, providing a cost-effective solution. However, the pursuit of autonomous exploration using these miniature robots presents challenges owing to constraints in computational power and battery capacity and reduced locomotion performance owing to their small size. Here, we introduce a two-wheeled centimeter-scale rover, designated Lunar Excursion Vehicle 2 (LEV-2), also known as SORA-Q (named after the Japanese words for space and sphere), which transforms into a wheeled configuration from a compact spherical form, enabling efficient traversal of soft lunar terrains. On 19 January 2024 (universal time coordinated), LEV-2 was deployed from a Japanese lunar lander, Smart Lander for Investigating Moon (SLIM), immediately before its landing on the lunar surface. After a lunar landing, the palm-sized rover accomplished autonomous lunar exploration by navigating around the SLIM lander, capturing images of both the SLIM lander and its environment and transmitting selected images through wireless communication on the lunar surface without reliance on ground-based teleoperation. This study details the system design of LEV-2 and presents the results of its in situ lunar activities, highlighting the efficacy of the proposed technologies necessary for mission implementation. Furthermore, we discuss the technical challenges encountered during the mission, including operational constraints and partial data loss, as well as the lessons learned for future exploration missions using small-scale space robots.

IJRR 2026-06-10 · 被引 3

EgoExo++: Integrating on-demand exocentric visuals with 2.5D ground surface estimation for interactive teleoperation of underwater ROVs

Adnan Abdullah, Ruo Chen, Ioannis Rekleitis, Md Jahidul Islam

EgoExo++ 在视觉 SLAM 管线中按需从第一人称画面合成第三人称视角,并结合 2.5D 地面估计,提升水下 ROV 遥操作的态势感知与操控精度。

看点用视角合成破解遥操作视野受限,水下交互的实用创新。

无人机 / 空中机器人导航 / SLAM / 自动驾驶人机交互 / 遥操作
摘要 Abstract

Underwater ROVs (Remotely Operated Vehicles) are indispensable for subsea exploration and task execution, yet typical teleoperation engines based on egocentric (first-person) video feeds restrict human operators’ field-of-view and limit precise maneuvering in complex, unstructured underwater environments. To address this, we first propose EgoExo, a geometry-driven solution integrated into a visual SLAM pipeline that synthesizes on-demand exocentric (third-person) views from egocentric camera feeds. We further propose EgoExo++, which extends beyond 2D exocentric view synthesis (EgoExo) to augment a piecewise-planar 2.5D ground surface estimation on-the-fly. Its anchor-free aerial viewpoint supports ground-relative reasoning, such as clearance and terrain-based navigation marker following. The computations involved are closed-form and rely solely on egocentric views and monocular SLAM estimates, which makes it portable across existing teleoperation engines and robust to varying waterbody characteristics. We validate the geometric accuracy of our approach through extensive experiments of 2-DOF indoor navigation and 6-DOF underwater cave exploration in challenging low-light conditions. To assess operational benefits, we conduct two user studies with simulation and real-world data, each involving 15 participants, comparing baseline egocentric teleoperation and EgoExo++. Results indicate improved system usability (SUS), reduced perceived workload (NASA-TLX), and significant gains in objective teleoperation performance, including 16% faster missions, 5-fold reduction in path deviation ratio, and fewer collision events (2 vs 5 across trials). Furthermore, we highlight the role of EgoExo++ augmented visuals in supporting shared autonomy, operator training, and embodied teleoperation. This new interactive approach to ROV teleoperation presents promising opportunities for future research in subsea telerobotics. The source packages for EgoExo++ are available at: https://github.com/uf-robopi/EgoExo .

T-RO 2026-06-02

LiDAR Teach, Radar Repeat: Robust Cross-Modal Navigation in Degenerate and Varying Environments

Renxiang Xiao, Yichen Chen, Yuanfan Zhang, Qianyi Shao, Yushuai Chen, Yuxuan Han, Yunjiang Lou, Liang Hu

「激光雷达示教、毫米波雷达重复」的跨模态示教-重复导航,在退化与多变(恶劣天气)环境中保持长期鲁棒自主。

看点跨模态 Teach-and-Repeat 兼顾精度与全天候鲁棒,工程价值高。

导航 / SLAM / 自动驾驶感知与传感控制与动力学
摘要 Abstract

Long-term autonomy requires robust navigation in environments subject to dynamic and static changes, as well as adverse weather conditions. Teach-and-Repeat (T&R) navigation offers a reliable and cost-effective solution by avoiding the need for consistent global mapping; however, existing T&R systems lack a systematic solution to tackle various environmental variations such as weather degradation, ephemeral dynamics, and structural changes. This work proposes LTR$^{2}$, the first cross-modal, cross-platform LiDAR-Teach-and-Radar-Repeat system that systematically addresses these challenges. LTR$^{2}$leverages LiDAR during the teaching phase to capture precise structural information under normal conditions and utilizes 4D millimeter-wave radar during the repeating phase for robust operation under environmental degradations. To align sparse and noisy forward-looking 4D radar with dense and accurate omnidirectional 3D LiDAR data, we introduce a Cross-Modal Registration (CMR) network that jointly exploits Doppler-based motion priors and the physical laws governing LiDAR intensity and radar power density. Furthermore, we propose an adaptive fine-tuning strategy that incrementally updates the CMR network based on localization errors, enabling long-term adaptability to static environmental changes without ground-truth labels. We demonstrate that the proposed CMR network achieves state-of-the-art cross-modal registration performance on the open-access dataset. Then we validate LTR$^{2}$across three robot platforms over a large-scale, long-term deployment (40+ km over 6 months), including challenging conditions such as nighttime smoke. Experimental results and ablation studies demonstrate centimeter-level accuracy and strong robustness against diverse environmental disturbances, significantly outperforming existing approaches.

🧠机器人学习 Robot Learning & RL179 篇

Sci. Robotics 2026-05-13

RAPTOR: A foundation policy for quadrotor control

Jonas Eschmann, Dario Albani, Giuseppe Loianno

RAPTOR 训练出单一四旋翼「基础控制策略」,无需重新系统辨识即可零样本适配从未见过的机体与扰动,直击 RL 策略过拟合单一环境的痛点。

看点用「基础模型」思路解决控制策略的泛化与 sim-to-real,是本季最值得关注的范式之一。

无人机 / 空中机器人机器人学习
摘要 Abstract

Humans are remarkably data efficient when adapting to previously unseen conditions, like driving a new car. In contrast, modern robotic control systems, like neural network policies trained using reinforcement learning (RL), are highly specialized for single environments. Because of this overfitting, they are known to break down even under small differences like the simulation-to-reality gap and require system identification and retraining for even minimal changes to the system. Here, we present RAPTOR, a method for training a highly adaptive foundation policy for quadrotor control. Our method enables training a single, end-to-end neural network policy to control a wide variety of quadrotors. We tested 10 different real quadrotors, from 32 grams to 2.4 kilograms, that also differed in motor type (brushed versus brushless), frame type (soft versus rigid), propeller type (two, three, or four blades), and flight controller (PX4, Betaflight, Crazyflie, M5StampFly). We found that a tiny, three-layer policy with only 2084 parameters was sufficient for zero-shot adaptation to a wide variety of platforms. The adaptation through in-context learning was made possible by using a recurrence in the hidden layer. The policy was trained through our proposed meta-imitation learning algorithm, where we sampled 1000 quadrotors and trained a teacher policy for each of them using RL. The 1000 teachers were distilled into a single, adaptive student policy. We found that within milliseconds, the resulting foundation policy adapted zero-shot to unseen quadrotors. We tested the capabilities of the foundation policy under numerous conditions (trajectory tracking, indoor/outdoor, wind disturbance, poking, and different propellers).

Sci. Robotics 2026-04-15

A careful examination of large behavior models for multitask dexterous manipulation

Jose Barreiros, Andrew Beaulieu, Aditya Bhat, Rick Cory, Eric Cousineau, Hongkai Dai, Ching-Hsin Fang, Kunimatsu Hashimoto, et al.

对多任务灵巧操作的大行为模型(LBM)做了一次罕见严谨的真机评测,量化其真实能力与边界,为「通用机器人基础模型」的炒作降温、为评测立标。

看点在一片乐观叙事中提供了扎实可复现的实证基准,研究者必读。

操作与机械臂机器人学习感知与传感
摘要 Abstract

Robot manipulation has seen tremendous progress in recent years, with imitation learning policies enabling successful performance of dexterous and hard-to-model tasks. Concurrently, scaling data and model size has led to the development of capable language and vision foundation models, motivating large-scale efforts to create general-purpose robot foundation models. Although these models have garnered considerable enthusiasm and investment, meaningful evaluation of real-world performance remains a challenge, limiting the pace of development and inhibiting a nuanced understanding of current capabilities. Here, we rigorously evaluated multitask robot manipulation policies, referred to as large behavior models, by extending the diffusion policy paradigm across a corpus of simulated and real-world robot data. We proposed and validated an evaluation pipeline to rigorously analyze the capabilities of these models with statistical confidence. We compared against single-task baselines through blind, randomized trials in a controlled setting, using both simulation and real-world experiments. We found that multitask pretraining made the policies more successful and robust and enabled teaching complex new tasks more quickly, using a fraction of the data when compared with single-task baselines. Moreover, performance predictably increased as pretraining scale and diversity grows.

AuRo 2026-06-10 · 被引 4

Large language models for multi-robot systems: a survey

Peihan Li, Zijian An, Shams Abrar, Lifeng Zhou

首篇系统梳理大语言模型在多机器人系统中应用的综述,按高层任务分配、中层运动规划、低层动作生成与人类干预分层归纳。

看点为「LLM × 多机器人」这一快速膨胀的方向提供清晰地图。

导航 / SLAM / 自动驾驶机器人学习多机器人 / 集群人机交互 / 遥操作
摘要 Abstract

The rapid advancement of Large Language Models (LLMs) has opened new possibilities in Multi-Robot Systems (MRS), enabling enhanced communication, task allocation and planning, and human-robot interaction. Unlike traditional single-robot and multi-agent systems, MRS poses unique challenges, including coordination, scalability, and real-world adaptability. This survey provides the first dedicated review of LLM integration into MRS. It systematically categorizes their applications across high-level task allocation, mid-level motion planning, low-level action generation, and human intervention. We highlight key applications in diverse domains, such as household robotics, construction, formation control, target tracking, and robot games, showcasing the versatility and transformative potential of LLMs in MRS. Furthermore, we examine the challenges that limit adapting LLMs to MRS, including mathematical reasoning limitations, hallucination, latency issues, and the need for robust benchmarking systems. Finally, we outline opportunities for future research, emphasizing advancements in fine-tuning, reasoning techniques, and task-specific models. This survey aims to guide researchers in the intelligence and real-world deployment of MRS powered by LLMs. Given the rapidly evolving nature of research in the field, we continuously update the paper list in the open-source GitHub repository .

👁️感知与传感 Perception & Sensing229 篇

T-RO 2026-06-08

Zero-Shot Sim-to-Real 6-DoF Pose Estimation for Underwater Vehicles Based on Uncertainty-Guided Dense Correspondence

Qingbo Wei, Yi Yang, Xingqun Zhou, Zhiqiang Hu, Chuanzhi Fan, Quan Zheng, Zhichao Wang

ZUPose 仅用合成数据训练即可零样本部署于真实水下场景的单目 6-DoF 位姿估计,以不确定性引导的稠密对应应对 sim-to-real 与水下光学退化。

看点算法-数据协同设计破解水下标注稀缺,多 AUV 协同的基础件。

导航 / SLAM / 自动驾驶机器人学习感知与传感多机器人 / 集群
摘要 Abstract

Accurate 6-DoF relative pose estimation is essential for multi-AUV cooperative tasks. However, pose-annotated underwater data are difficult to obtain, limiting learning-based methods. We present ZUPose, a monocular pose estimator trained entirely on synthetic data and deployed directly in real underwater scenes. A key challenge is prediction noise introduced by the sim-to-real gap and underwater optical degradation. To address it, we adopt an algorithm-data co-design strategy. At the algorithm level, we develop an uncertainty-guided densecorrespondence framework in which the network jointly predicts dense correspondences and per-pixel uncertainty under a Laplace-based probabilistic formulation. The predicted uncertainty acts as a learned scale parameter to model correspondence noise and guide pose optimization. At the data level, we construct a physics-guided simulation pipeline to model underwater optical degradation and generate diverse synthetic images. In real turbid water, ZUPose achieves translation and rotation errors of 6.7 cm and 7.7$^\circ$, with both reduced by about half compared with the best-performing baseline. The method remains stable under overexposure and long-range observation, and dual-AUV navigation experiments further validate its practical viability.

Sci. Robotics 2026-05-20

Fusing LiDAR and vision to generate high-quality reconstructions

Amos Matsiko

基于神经辐射场、融合激光雷达与视觉的重建框架,兼顾几何精度与高质量外观。

看点把 LiDAR 的几何与相机的纹理在 NeRF 框架中统一,重建质量上台阶。

感知与传感
摘要 Abstract

A neural radiance field–based reconstruction framework merging LiDAR and vision data achieves geometric accuracy.

Sci. Robotics 2026-03-25 · 被引 1

The forgotten spectrum: Reviving ultrasound for robust autonomy

Xin Zhou, Fei Gao

观点文章:被视为「过时」的超声配合边缘 AI 去噪,可在视觉失效时显著提升自主系统的鲁棒性。

看点提醒社区重估低成本传感器在鲁棒感知中的价值。

感知与传感
摘要 Abstract

Seemingly outdated ultrasound combined with edge-AI denoising can make autonomy more robust when vision fails.

🪼医疗 / 软体 / 微纳 Medical, Soft & Micro Robots70 篇

Sci. Robotics 2026-05-20

A minimally invasive robotic spinal surgical system for anterior lumbar nerve decompression

Qingxiang Zhao, Xiandi Wang, Xin Zhong, Runfeng Zhu, Peizhi Zhou, Dan Pu, Baitao Lin, Tao Li, et al.

面向前路腰椎神经减压的微创手术机器人系统,以更高远端灵活度与可视性弥补传统前路术式视野受限、减压不彻底的不足。

看点把临床痛点与机器人灵巧设计紧密结合,医疗机器人转化的优秀样本。

操作与机械臂导航 / SLAM / 自动驾驶机器人学习医疗 / 软体 / 微纳
摘要 Abstract

Lumbar degenerative diseases, primarily caused by pathological tissues compressing spinal nerves, typically necessitate surgical intervention—specifically lumbar nerve decompression—to alleviate pain. Although the anterior decompression approach demonstrates notable advantages, such as reduced bleeding and shorter postoperative hospitalization stays, compared with the conventional posterior approach, patients may still experience incomplete decompression because of various instrumental shortcomings, including restricted visibility and insufficiency of distal dexterity. In this study, we present a robotic surgical system for minimally invasive anterior lumbar nerve decompression, which comprises three slender robotic arms (2 millimeters in outer diameter) with high dexterity (18 degrees of freedom), facilitating effective navigation through the narrow intervertebral disc space to reach the posterior area. Each robot arm is based on concentric push-pull robot structure, forming three robotized instruments: an endoscope for visualization, a laser optical fiber for hemostasis and resection, and a gripper for tissue manipulation. These components are integrated through the hollow lumen of a slender trocar, and multi-instrument coordination enables effective decompression procedure with wide view. System performance was first validated using a three-dimensional–printed vertebral phantom model to confirm accessibility to bilateral articular processes. Subsequently, in vivo animal experiment and human cadaver tests were conducted to further demonstrate the full capabilities in performing minimally invasive lumbar nerve decompression. This study demonstrates the potential of the robotic system to facilitate surgical procedures in narrow, confined, and tortuous anatomical spaces, addressing the key limitations of conventional instruments in anterior lumbar nerve decompression.

Sci. Robotics 2026-03-25 · 被引 1

Electrofluidic fiber muscles

O. K. Afsar, G. Pupillo, G. Vitucci, W. Babatain, H. Ishii, V. Cacucciolo

电流体纤维人工肌肉具备与骨骼肌相当的功率密度(50 W/kg)、20% 收缩率与 0.3 s 响应,且纤维形态可模块化、密集集成。

看点软体驱动长期受限于功率密度与集成度,这项工作直指核心瓶颈。

控制与动力学
摘要 Abstract

Actuators are to robots what muscles are to humans. They enable motion and determine strength and dexterity. The fiber form factor makes skeletal muscles modular, scalable, and densely integrated (50% of human body weight). In contrast, servo motors that drive today’s robots lack the flexibility and modularity of muscle fibers, limiting integration and dexterity. Here, we report electrofluidic fiber muscles, soft artificial muscles for robotic applications with power density comparable to skeletal muscles (50 watts per kilogram), contraction strains of 20%, and response time of 0.3 second. These 2-millimeter-thick muscles comprise antagonistic fluidic actuators driven by electrohydrodynamic fiber pumps in a closed circuit. They require no external liquid reservoir and are electrically driven, untethered, and silent. We demonstrated that performance is increased by pre-pressurizing the muscles at an optimal bias pressure. Applying bias pressure allowed the antagonist actuator to act as a reservoir for the agonist, enabled 200% higher operating voltages by preventing cavitation, and leveraged the nonlinear pressure-stroke response of the actuators, increasing strain threefold at a given pump pressure. We characterized and modeled their dynamics, identifying optimal bias pressures. Electrofluidic muscles scale by simply bundling fibers. By selecting the ratio between pumps and actuators, we programmed their performance for different robotic tasks: a fast lever (180 millimeters per second) that launches objects in <0.3 second; a strong bundle that lifts 4 kilograms (200 times its weight) with a 30-millimeter stroke; a woven muscle that bends a robot arm by 40° and is compliant enough for a human handshake.

Sci. Robotics 2026-04-22

Designing microrobots with embodied physical intelligence

Melisa Yashinski

3D 微打印单元构成的柔性链在与环境交互时涌现复杂动力学,展示「具身物理智能」的微机器人设计思路。

看点把智能部分外包给材料与结构本身,微纳机器人设计新范式。

医疗 / 软体 / 微纳控制与动力学
摘要 Abstract

A flexible chain of 3D microprinted units exhibits emergent dynamics in response to environmental interactions.

🐝多机器人 / 集群 Multi-Robot & Swarm67 篇

Sci. Robotics 2026-05-13

Cross-link collective: Entangled robotic matter with cohesive motion

Danna Ma, Baxi Chong, Daniel I. Goldman, Kirstin H. Petersen

受活性凝胶交联启发的「交联集体」:物理缠绕的机器人物质在无固定连接、无显式协调下保持内聚并协同运动。

看点用物理缠绕换取鲁棒可扩展的群体行为,集群机器人的新形态。

多机器人 / 集群
摘要 Abstract

Robotic applications increasingly demand systems that are resilient, adaptable, and scalable. One promising route is through collectives of simple modules, where complex group-level behavior emerges from local interactions. By omitting fixed topologies and tight coordination, this approach sacrifices predictability and conventional tools for behaviors inherently optimized through stochastic mechanical interactions. A key challenge is maintaining cohesion and functionality without fixed connections and explicit coordination. We introduce the cross-link collective, a physically entangled robotic system inspired by cross-linking in active gels. Through shape morphing and transient entanglement, individually immobile modules produce sustained collective motion. The mechanically intelligent robot matter favors chains and phase relationships that reduce joint torques and reconfigures in response to perturbations. We show that distributed control can be added to this substrate to further enhance cohesion. Leveraging weak, reversible connections, the cross-link collective is adaptable, scalable, and fault tolerant, offering insights to applications from soft matter and robotics.

AuRo 2026-06-10 · 被引 4

Large language models for multi-robot systems: a survey

Peihan Li, Zijian An, Shams Abrar, Lifeng Zhou

首篇系统梳理大语言模型在多机器人系统中应用的综述,按高层任务分配、中层运动规划、低层动作生成与人类干预分层归纳。

看点为「LLM × 多机器人」这一快速膨胀的方向提供清晰地图。

导航 / SLAM / 自动驾驶机器人学习多机器人 / 集群人机交互 / 遥操作
摘要 Abstract

The rapid advancement of Large Language Models (LLMs) has opened new possibilities in Multi-Robot Systems (MRS), enabling enhanced communication, task allocation and planning, and human-robot interaction. Unlike traditional single-robot and multi-agent systems, MRS poses unique challenges, including coordination, scalability, and real-world adaptability. This survey provides the first dedicated review of LLM integration into MRS. It systematically categorizes their applications across high-level task allocation, mid-level motion planning, low-level action generation, and human intervention. We highlight key applications in diverse domains, such as household robotics, construction, formation control, target tracking, and robot games, showcasing the versatility and transformative potential of LLMs in MRS. Furthermore, we examine the challenges that limit adapting LLMs to MRS, including mathematical reasoning limitations, hallucination, latency issues, and the need for robust benchmarking systems. Finally, we outline opportunities for future research, emphasizing advancements in fine-tuning, reasoning techniques, and task-specific models. This survey aims to guide researchers in the intelligence and real-world deployment of MRS powered by LLMs. Given the rapidly evolving nature of research in the field, we continuously update the paper list in the open-source GitHub repository .

T-RO 2026-06-04

Fault-Tolerant Multi-Modal Localization of Multi-Robots on Matrix Lie Groups

Mahboubeh Zarei, Robin Chhabra

在矩阵李群上提出容错多模态多机器人定位框架,给出李群上相关/非相关估计的复合、求差、求逆、平均与融合等随机运算。

看点为协同多机器人定位提供数学上自洽且容错的统一框架。

导航 / SLAM / 自动驾驶感知与传感多机器人 / 集群
摘要 Abstract

Consistent localization of cooperative multi-robot systems during navigation presents substantial challenges. This paper proposes a fault-tolerant, multi-modal localization framework for multi-robot systems on matrix Lie groups. We introduce novel stochastic operations to perform composition, differencing, inversion, averaging, and fusion of correlated and non-correlated estimates on Lie groups, enabling pseudo-pose construction for filter updates. The method integrates a combination of proprioceptive and exteroceptive measurements from inertial, velocity, and pose (pseudo-pose) sensors on each robot in an Extended Kalman Filter (EKF) framework. The prediction step is conducted on the Lie group$\mathbb {SE}_{2}(3) \times \mathbb {R}^{3} \times \mathbb {R}^{3}$, where each robot's pose, velocity, and inertial measurement biases are propagated. The proposed framework uses body velocity, relative pose measurements from fiducial markers, and inter-robot communication to provide scalable EKF update across the network on the Lie group$\mathbb {SE}(3) \times \mathbb {R}^{3}$. A fault detection module is implemented, allowing the integration of only reliable pseudo-pose measurements from fiducial markers. We demonstrate the effectiveness of the method through experiments with a network of wheeled mobile robots equipped with inertial measurement units, wheel odometry, and ArUco markers. The comparison results highlight the proposed method's real-time performance, superior efficiency, reliability, and scalability in multi-robot localization, making it well-suited for large-scale robotic systems.

🤝人机交互 / 遥操作 Human-Robot Interaction48 篇

IJRR 2026-06-10 · 被引 3

EgoExo++: Integrating on-demand exocentric visuals with 2.5D ground surface estimation for interactive teleoperation of underwater ROVs

Adnan Abdullah, Ruo Chen, Ioannis Rekleitis, Md Jahidul Islam

EgoExo++ 在视觉 SLAM 管线中按需从第一人称画面合成第三人称视角,并结合 2.5D 地面估计,提升水下 ROV 遥操作的态势感知与操控精度。

看点用视角合成破解遥操作视野受限,水下交互的实用创新。

无人机 / 空中机器人导航 / SLAM / 自动驾驶人机交互 / 遥操作
摘要 Abstract

Underwater ROVs (Remotely Operated Vehicles) are indispensable for subsea exploration and task execution, yet typical teleoperation engines based on egocentric (first-person) video feeds restrict human operators’ field-of-view and limit precise maneuvering in complex, unstructured underwater environments. To address this, we first propose EgoExo, a geometry-driven solution integrated into a visual SLAM pipeline that synthesizes on-demand exocentric (third-person) views from egocentric camera feeds. We further propose EgoExo++, which extends beyond 2D exocentric view synthesis (EgoExo) to augment a piecewise-planar 2.5D ground surface estimation on-the-fly. Its anchor-free aerial viewpoint supports ground-relative reasoning, such as clearance and terrain-based navigation marker following. The computations involved are closed-form and rely solely on egocentric views and monocular SLAM estimates, which makes it portable across existing teleoperation engines and robust to varying waterbody characteristics. We validate the geometric accuracy of our approach through extensive experiments of 2-DOF indoor navigation and 6-DOF underwater cave exploration in challenging low-light conditions. To assess operational benefits, we conduct two user studies with simulation and real-world data, each involving 15 participants, comparing baseline egocentric teleoperation and EgoExo++. Results indicate improved system usability (SUS), reduced perceived workload (NASA-TLX), and significant gains in objective teleoperation performance, including 16% faster missions, 5-fold reduction in path deviation ratio, and fewer collision events (2 vs 5 across trials). Furthermore, we highlight the role of EgoExo++ augmented visuals in supporting shared autonomy, operator training, and embodied teleoperation. This new interactive approach to ROV teleoperation presents promising opportunities for future research in subsea telerobotics. The source packages for EgoExo++ are available at: https://github.com/uf-robopi/EgoExo .

Sci. Robotics 2026-06-10

Therapist-exoskeleton-patient interaction for gait therapy

Emek Barış Küçüktabak, Matthew R. Short, Lorenzo Vianello, Daniel Ludvig, Levi Hargrove, Kevin Lynch, Jose Pons

面向卒中步态康复的「治疗师-外骨骼-患者」三方交互框架,支持多关节同步辅助、减轻治疗师负担并提供客观反馈。

看点把康复机器人从「替代人手」做成「人机协同」,临床适配性强。

足式 / 四足机器人医疗 / 软体 / 微纳人机交互 / 遥操作
摘要 Abstract

After a stroke, individuals often experience mobility impairments because of weakness and loss of independent joint control in the lower limbs. As a result, gait recovery becomes a primary goal of physical rehabilitation, traditionally achieved through high-intensity therapist-led training. However, conventional therapist-led approaches involving manual assistance or resistance can be physically demanding and limit interaction at multiple joints simultaneously. Robotic exoskeletons have emerged as a promising solution, enabling multijoint support, reducing therapist strain, and offering objective performance feedback. However, typical exoskeleton control strategies limit the physical therapist’s involvement and adaptability to the patient’s needs, which may hinder clinical adoption and outcomes. In this study, we introduce a gait rehabilitation paradigm based on physical human-robot-human interaction that we call therapist-exoskeleton-patient interaction (TEPI), in which a therapist and a patient with stroke are each equipped with a lower-limb exoskeleton virtually connected at the hips and knees via spring-damper elements. This connection enables bidirectional physical interaction, allowing the therapist to guide the patient’s movement while receiving real-time haptic feedback. We evaluated this approach with eight patients with chronic stroke using a within-subject design, comparing TEPI training with conventional therapist-guided mobilization during treadmill walking. Results showed that, compared with conventional therapy, TEPI led to greater joint range of motion, increased step length and height, similar muscle activation, and high self-reported motivation and enjoyment. These findings suggest that TEPI can integrate robotic precision with therapist intuition, offering a framework for enhancing gait rehabilitation outcomes in populations recovering from stroke.

Sci. Robotics 2026-06-10

Do people feel safe in a robot’s presence?

Melisa Yashinski

通过生理与质性数据揭示人在移动机器人靠近时的「感知安全」,为人机共处的安全设计提供依据。

看点把「人是否感到安全」量化为可测数据,服务机器人设计的关键一环。

导航 / SLAM / 自动驾驶
摘要 Abstract

Physiological and qualitative data reveal insights into human perceived safety of mobile robot encounters.

📐控制与动力学 Control & Dynamics170 篇

T-RO 2026-04-22 · 被引 1

Optimal Energy Shaping and Force Amplification Framework for Task-Agnostic, Biomimetic Ankle Exoskeletons

Katharine Walters, Gray C. Thomas, Robert D. Gregg

面向可背驱下肢/踝外骨骼的任务无关控制:以最优能量塑形与力放大框架可解释地逼近生物力矩,并兼顾安全性。

看点在外骨骼控制的可解释性与安全性之间给出兼顾方案。

机器人学习医疗 / 软体 / 微纳控制与动力学
摘要 Abstract

Task-agnostic controllers for backdrivable lower-limb exoskeletons aim to reliably mimic biological torque while seamlessly adapting to changing movement patterns. However, current approaches relying on hidden state estimators or neural networks lack explainability and safety guarantees, while force amplification methods risk instability with an inherent tradeoff between sensitivity and robustness to control inputs. Energy shaping control uses a kinematic model-based framework to provide predictable, stable assistance, though its traditional passive form limits biomimetic performance. Previous work relaxed the strict passivity requirements to improve biomimicry but reduced the stability guarantees. This paper presents an optimization-based extension of the energy-shaping control framework that combines the stability benefits of energy shaping with the intuitive biomimicry of force amplification. Our framework enables controlled trade-offs between sensitivity to changing human impedance and high performance through adjustable cost contributions of force amplification and model-based terms. We provide theoretical guarantees of closed-loop stability to an invariant set under human joint impedance control, supported by empirical validation of stability characteristics of an ankle exoskeleton under varying controller passivity constraints. A study of ten able-bodied participants using bilateral ankle exoskeletons demonstrates that the biomimetic controller reduced biological ankle torque by 18.7% across various activities of daily life.

IJRR 2026-06-18

Energy-optimal linear quadratic tracking control for unmanned underwater vehicles in offshore aquaculture fish net-pen visual inspection

Thein Than Tun, Loulin Huang, Mark Anthony Preece

面向离岸养殖网箱视觉巡检的 UUV,提出能量最优线性二次跟踪控制,在有限电量下延展作业范围与时长。

看点把能量预算直接写进控制目标,野外作业机器人的务实之作。

控制与动力学
摘要 Abstract

Unmanned underwater vehicles (UUVs) have been deployed for fish net-pen visual inspection (FNVI) in offshore aquaculture. Limited energy capacity of onboard power supplies constrains the UUV’s working range and operating time. To minimize the energy consumption by the UUV during the FNVI of the Blue Endeavour Project (an offshore salmon farm of the New Zealand King Salmon Company), an energy-optimal linear quadratic tracking (EO-LQT) control scheme is proposed in this paper. For EO-LQTs implementation, a new Linear-Parameter-Varying (LPV) system that approximates the nonlinear UUV dynamics model with an accuracy of approximately 99% regardless of the operating points in real-time, with the modified versions of Bhāskara I’s sine approximation and Shirali’s cosine approximation, is developed. The use of the Lagrangian under the Principle of Least Action with the UUV’s kinetic energy and the non-quadratic thruster power function in the EO-LQT performance index (PI) is demonstrated. The steps to solve the Hamilton-Jacobi-Bellman (HJB) equation with the non-quadratic Hamiltonian H are detailed to derive the new analytical EO-LQT optimal control form. Five EO-LQT controllers with different PIs are tested against the conventional LQT (CO-LQT) controller in both high-fidelity simulations under simulated disturbance speed up to 0.9 m/s and pool experiments, reducing energy consumption up to 37.1%. As key comparison metrics for the pose tracking and energy consumption, the mean-absolute-error (MAE) and T200 thruster power function are used to validate the effectiveness of the proposed EO-LQT controllers, compared to the CO-LQT controller.

IJRR 2026-06-06

Control of the uncertain fully flexible link-joint robot manipulators: A free-drift adaptive fractional-order robust approach

Seyed Jalal Aldin Hoseini, Mazda Moattari, Saeed Zaare

针对不确定的全柔性连杆-关节机械臂,提出免漂移自适应分数阶鲁棒控制,实现快速低振动跟踪。

看点面向高度欠驱动柔性臂的鲁棒控制,理论与振动抑制并重。

操作与机械臂控制与动力学
摘要 Abstract

This paper presents a fast and low-vibration tracking control strategy for an uncertain fully flexible link-joint (FFLJ) robot manipulator. Due to the highly underactuated nature of the system, along with the presence of uncertainties and external disturbances, a two-time-scale singular perturbation (SP) approach is adopted to decompose the system into slow and fast subsystems. To control the slow subsystem, a fractional-order fast terminal sliding mode control (FOFTSMC) is designed, ensuring rapid convergence with minimal transient and steady-state errors, which is essential for vibration suppression. Additionally, a free-drift partially adaptive super twisting reaching law is incorporated to prevent overestimation of control inputs, mitigate uncertainties and disturbances, and reduce chattering while optimizing energy efficiency. For the fast subsystem, a linear state-space representation is formulated based on the slow subsystem’s control input, explicitly considering uncertainties. An optimal proportional derivative linear quadratic regulator (PD-LQR) is then employed to regulate the fast subsystem dynamics, leading to a robust composite control scheme. A rigorous stability analysis guarantees the global asymptotic stability of both subsystems and the overall closed-loop control system. Simulation results confirm the effectiveness of the proposed strategy in handling nonlinearities, underactuation, and uncertainties. Compared to the FOFTSMC and the integer order robust fuzzy SMC (IORFSMC), the proposed free-drift adaptive FOFTSMC (AFOFTSMC) demonstrates superior performance, achieving approximately 45% and 85% improvement, respectively, in the first link tracking, and 4% and 65% improvement, respectively, in the second link tracking in terms of the integral of time multiplied by absolute error (ITAE) index. These results highlight the proposed approach’s robustness, efficiency, and capability to ensure precise trajectory tracking, vibration suppression, and chattering reduction while maintaining energy efficiency.

All Papers全部文献

IJRR 2026-06-10 · 被引 3

EgoExo++: Integrating on-demand exocentric visuals with 2.5D ground surface estimation for interactive teleoperation of underwater ROVs

Adnan Abdullah, Ruo Chen, Ioannis Rekleitis, Md Jahidul Islam

无人机 / 空中机器人导航 / SLAM / 自动驾驶人机交互 / 遥操作
摘要

Underwater ROVs (Remotely Operated Vehicles) are indispensable for subsea exploration and task execution, yet typical teleoperation engines based on egocentric (first-person) video feeds restrict human operators’ field-of-view and limit precise maneuvering in complex, unstructured underwater environments. To address this, we first propose EgoExo, a geometry-driven solution integrated into a visual SLAM pipeline that synthesizes on-demand exocentric (third-person) views from egocentric camera feeds. We further propose EgoExo++, which extends beyond 2D exocentric view synthesis (EgoExo) to augment a piecewise-planar 2.5D ground surface estimation on-the-fly. Its anchor-free aerial viewpoint supports ground-relative reasoning, such as clearance and terrain-based navigation marker following. The computations involved are closed-form and rely solely on egocentric views and monocular SLAM estimates, which makes it portable across existing teleoperation engines and robust to varying waterbody characteristics. We validate the geometric accuracy of our approach through extensive experiments of 2-DOF indoor navigation and 6-DOF underwater cave exploration in challenging low-light conditions. To assess operational benefits, we conduct two user studies with simulation and real-world data, each involving 15 participants, comparing baseline egocentric teleoperation and EgoExo++. Results indicate improved system usability (SUS), reduced perceived workload (NASA-TLX), and significant gains in objective teleoperation performance, including 16% faster missions, 5-fold reduction in path deviation ratio, and fewer collision events (2 vs 5 across trials). Furthermore, we highlight the role of EgoExo++ augmented visuals in supporting shared autonomy, operator training, and embodied teleoperation. This new interactive approach to ROV teleoperation presents promising opportunities for future research in subsea telerobotics. The source packages for EgoExo++ are available at: https://github.com/uf-robopi/EgoExo .

Sci. Robotics 2026-05-13 · 被引 1

“Humanoids will soon replace most human workers”: A debate

Hesheng Wang, Michael Wang, Frank Park, Lijun Han, Huichan Zhao, XingXing Wang, et al.

人形机器人
摘要

At the 2025 International Conference of Intelligent Robots and Systems (IROS) in Zhejiang, China, leading researchers and industry experts debated whether humanoid robots will soon replace human workers. A summary of the points raised by the debaters for or against the claim is highlighted below.

Sci. Robotics 2026-06-10

Therapist-exoskeleton-patient interaction for gait therapy

Emek Barış Küçüktabak, Matthew R. Short, Lorenzo Vianello, Daniel Ludvig, Levi Hargrove, Kevin Lynch, et al.

足式 / 四足机器人医疗 / 软体 / 微纳人机交互 / 遥操作
摘要

After a stroke, individuals often experience mobility impairments because of weakness and loss of independent joint control in the lower limbs. As a result, gait recovery becomes a primary goal of physical rehabilitation, traditionally achieved through high-intensity therapist-led training. However, conventional therapist-led approaches involving manual assistance or resistance can be physically demanding and limit interaction at multiple joints simultaneously. Robotic exoskeletons have emerged as a promising solution, enabling multijoint support, reducing therapist strain, and offering objective performance feedback. However, typical exoskeleton control strategies limit the physical therapist’s involvement and adaptability to the patient’s needs, which may hinder clinical adoption and outcomes. In this study, we introduce a gait rehabilitation paradigm based on physical human-robot-human interaction that we call therapist-exoskeleton-patient interaction (TEPI), in which a therapist and a patient with stroke are each equipped with a lower-limb exoskeleton virtually connected at the hips and knees via spring-damper elements. This connection enables bidirectional physical interaction, allowing the therapist to guide the patient’s movement while receiving real-time haptic feedback. We evaluated this approach with eight patients with chronic stroke using a within-subject design, comparing TEPI training with conventional therapist-guided mobilization during treadmill walking. Results showed that, compared with conventional therapy, TEPI led to greater joint range of motion, increased step length and height, similar muscle activation, and high self-reported motivation and enjoyment. These findings suggest that TEPI can integrate robotic precision with therapist intuition, offering a framework for enhancing gait rehabilitation outcomes in populations recovering from stroke.

Sci. Robotics 2026-06-10

From ball to rover: Transformable palm-sized rover SORA-Q for autonomous lunar exploration

D. Hirano, M. Inazawa, M. Sutoh, M. Nagata, Y. Yoneda, K. Watanabe, et al.

足式 / 四足机器人导航 / SLAM / 自动驾驶人机交互 / 遥操作
摘要

Robotic technologies are expected to drive substantial advancements in planetary exploration and resource prospecting by performing a variety of tasks in extraterrestrial environments. In particular, miniature robots are ideally suited for integration into spacecraft with strict payload limitations, providing a cost-effective solution. However, the pursuit of autonomous exploration using these miniature robots presents challenges owing to constraints in computational power and battery capacity and reduced locomotion performance owing to their small size. Here, we introduce a two-wheeled centimeter-scale rover, designated Lunar Excursion Vehicle 2 (LEV-2), also known as SORA-Q (named after the Japanese words for space and sphere), which transforms into a wheeled configuration from a compact spherical form, enabling efficient traversal of soft lunar terrains. On 19 January 2024 (universal time coordinated), LEV-2 was deployed from a Japanese lunar lander, Smart Lander for Investigating Moon (SLIM), immediately before its landing on the lunar surface. After a lunar landing, the palm-sized rover accomplished autonomous lunar exploration by navigating around the SLIM lander, capturing images of both the SLIM lander and its environment and transmitting selected images through wireless communication on the lunar surface without reliance on ground-based teleoperation. This study details the system design of LEV-2 and presents the results of its in situ lunar activities, highlighting the efficacy of the proposed technologies necessary for mission implementation. Furthermore, we discuss the technical challenges encountered during the mission, including operational constraints and partial data loss, as well as the lessons learned for future exploration missions using small-scale space robots.

Sci. Robotics 2026-06-10

Precise aggressive aerial maneuvers with sensorimotor policies

Tianyue Wu, Guangtong Xu, Zihan Wang, Junxiao Lin, Tianyang Chen, Yuze Wu, et al.

无人机 / 空中机器人导航 / SLAM / 自动驾驶机器人学习感知与传感
摘要

Precise aggressive maneuvers with lightweight onboard sensors remain a key bottleneck in fully exploiting the maneuverability of drones. Such maneuvers are critical for expanding the systems’ accessible area by navigating through narrow openings in the environment. One of the most relevant problems is aggressive traversal through narrow gaps with quadrotors under constraints in the special Euclidean group of three dimensions [ SE ( 3 ) ], which requires the quadrotors to leverage a momentary tilted attitude and the asymmetry of the airframes to navigate through gaps. Here, we achieved such maneuvers by developing sensorimotor policies directly mapping onboard vision and proprioception into low-level control commands. The policies were trained using reinforcement learning (RL) with end-to-end policy distillation in simulation. We mitigated the model-free RL’s exploration challenge on the restricted solution space with an initialization strategy leveraging trajectories generated by a model-based planner. Careful sim-to-real design allowed the policy to control a quadrotor through narrow gaps with low clearances and high repeatability. For instance, the proposed method enabled a quadrotor to navigate a rectangular gap at a 5-centimeter clearance, tilted at an orientation up to 90°, without knowledge of the gap’s position or orientation. Without training on dynamic gaps, the policy could reactively servo the quadrotor to traverse through a moving gap. The proposed method was validated on challenging tracks of narrow, closely placed gaps. The flexibility of the policy learning method was demonstrated by developing policies on geometrically diverse gaps without relying on manually defined traversal poses and visual features.

Sci. Robotics 2026-06-10

Translational bottlenecks for biohybrid microrobots

Liangfang Zhang, Joseph Wang

医疗 / 软体 / 微纳
摘要

Rapid advances in biohybrid microrobots have prompted focused examination of the barriers to their clinical translation.

Sci. Robotics 2026-06-10

Do people feel safe in a robot’s presence?

Melisa Yashinski

导航 / SLAM / 自动驾驶
摘要

Physiological and qualitative data reveal insights into human perceived safety of mobile robot encounters.

IJRR 2026-05-03 · 被引 1

K2MUSE: A human lower-limb multimodal walking dataset spanning task and acquisition variability for rehabilitation robotics

Jiwei Li, Bi Zhang, Xiaowei Tan, Wanxin Chen, Zhaoyuan Liu, Juanjuan Zhang, et al.

足式 / 四足机器人机器人学习医疗 / 软体 / 微纳
摘要

The natural interaction and control performance of lower limb rehabilitation robots are closely linked to biomechanical information from various human locomotion activities. Multidimensional human motion data significantly deepen the understanding of the complex mechanisms governing neuromuscular alterations, thereby facilitating the development and application of rehabilitation robots in multifaceted real-world environments. However, existing lower limb datasets are inadequate for supplying the essential multimodal data and large-scale gait samples necessary for the development of effective data-driven approaches, and the significant effects of acquisition interference in real applications are neglected. To fill this gap, we present the K2MUSE dataset, which includes a comprehensive collection of multimodal data, comprising kinematic, kinetic, amplitude mode ultrasound (AUS), and surface electromyography (sEMG) measurements. The proposed dataset includes lower limb multimodal data collected from two cohorts, including 30 able-bodied young adults and 12 older adults, across different inclines (0°, ±5°, and ±10°), speeds (0.5 m/s, 1.0 m/s, and 1.5 m/s), and representative non-ideal acquisition conditions (muscle fatigue, electrode shifts, and interday differences). The kinematic and ground reaction force data were collected with a Vicon motion capture system and an instrumented treadmill with embedded force plates, whereas the sEMG and AUS data of 13 muscles on the bilateral lower limbs were synchronously recorded. To validate the quality of the data, we quantified repeatability across locomotion modes, speeds, and inclines, examined physiological signatures under non-ideal acquisition conditions, and observed high agreement with existing public datasets. We also report baseline motion-intention recognition results, including joint angle estimation and gait phase classification, with a multimodal transformer model demonstrating accurate and stable performance. In addition, we present a control-oriented application in which an end-to-end model trained on the K2MUSE dataset provides hip assistance via a soft exoskeleton and yields consistent reductions in metabolic cost across multiple terrains. K2MUSE is released with the corresponding structured documentation, preprocessing pipelines, and example code, thereby providing a comprehensive resource for rehabilitation robot development, biomechanical analysis, and wearable sensing research. The dataset is available at https://k2muse.github.io/ .

T-RO 2026-06-08

Parallel-Elastic Actuation with Reactive Latch Elevates Robotic Hopping Performance: Jump Height and Continuity

Songnan Bai, Runze Ding, Song Li, Ruihan Jia, Ruobing Wang, Zhiyuan Zhang, et al.

无人机 / 空中机器人足式 / 四足机器人控制与动力学
摘要

While many animals exhibit impressive hopping capabilities, machines have struggled to match their performance. Current hopping robots face limitations in power density, energy efficiency, and control stability. Here, we present a parallel-elastic actuation mechanism with a reactive latch that optimizes energy transfer, enabling a legged robot to achieve hopping heights and continuity previously unattainable. This mechanism efficiently stores and releases energy, extending the actuation period over the aerial phase while minimizing stance time. Our robot achieves a maximum hopping height of 3.6 meters, surpassing both human and animal records while demonstrating sustained, high-frequency hopping cycles with minimal power requirement. By integrating inertia-based onboard sensorimotor autonomy, we demonstrate stable, controlled hopping in environments without external aid. These results represent a step toward bridging the performance gap between biological and robotic locomotion, with potential to influence the design of future legged systems.

T-RO 2026-06-08

MobileROS: A Wireless-Native Robot Operating System for Mobile Robotics

Boyi Liu, Qianyi Zhang, Yongguang Lu, Jianhao Jiao, Jagmohan Chauhan, Wen Wu, et al.

导航 / SLAM / 自动驾驶感知与传感多机器人 / 集群
摘要

The increasing deployment of mobile robots in dynamic outdoor environments necessitates robotic systems capable of maintaining reliability amidst fluctuating wireless connectivity. While the Robot Operating System (ROS) has established itself as the de facto standard for such networked robotics, its abstraction of communication as an opaque, besteffort utility creates a critical bottleneck: it fails to leverage physical layer (PHY) information, resulting in degraded performance and unreliable execution in fluctuating networks. To address this, this paper presents MobileROS, a wireless-native robot operating system that transforms wireless communication from an external service into a core system resource. Grounded in the Symbiotic Paradigm, MobileROS establishes a bidirectional exchange where network conditions inform robotic decisions and mission requirements guide network resource allocation. Based on service mesh principles and domain-driven design, our architecture implements a Hub-Engines-Cells (HEC) model. It features a central Hub for global optimization, three specialized engines (the Radio Information Engine, the Cross Domain Engine, and the Physical Adaptive Engine) for crosslayer intelligence, and distributed Cells as functional units. A key mechanism, Application-Driven Bidirectional Dynamic Slicing, allows robots to actively reconfigure network resources based on semantic urgency, transforming the robot from a passive observer into an active network controller. We systematically evaluate MobileROS across three cities (London, Hong Kong, and Shenzhen) in five scenarios: distributed visual SLAM, cross-domain LiDAR perception, V2X autonomous driving, hybrid multi-robot collaboration against WebRTC baselines, and partition recovery validating CAP-theorem-aware failsafe mechanisms. Results demonstrate that MobileROS maintains significantly more stable performance than standard ROS in mobile wireless deployments.We provide implementation details athttps://github.com/MobileROS.

T-RO 2026-06-08

Zero-Shot Sim-to-Real 6-DoF Pose Estimation for Underwater Vehicles Based on Uncertainty-Guided Dense Correspondence

Qingbo Wei, Yi Yang, Xingqun Zhou, Zhiqiang Hu, Chuanzhi Fan, Quan Zheng, et al.

导航 / SLAM / 自动驾驶机器人学习感知与传感多机器人 / 集群
摘要

Accurate 6-DoF relative pose estimation is essential for multi-AUV cooperative tasks. However, pose-annotated underwater data are difficult to obtain, limiting learning-based methods. We present ZUPose, a monocular pose estimator trained entirely on synthetic data and deployed directly in real underwater scenes. A key challenge is prediction noise introduced by the sim-to-real gap and underwater optical degradation. To address it, we adopt an algorithm-data co-design strategy. At the algorithm level, we develop an uncertainty-guided densecorrespondence framework in which the network jointly predicts dense correspondences and per-pixel uncertainty under a Laplace-based probabilistic formulation. The predicted uncertainty acts as a learned scale parameter to model correspondence noise and guide pose optimization. At the data level, we construct a physics-guided simulation pipeline to model underwater optical degradation and generate diverse synthetic images. In real turbid water, ZUPose achieves translation and rotation errors of 6.7 cm and 7.7$^\circ$, with both reduced by about half compared with the best-performing baseline. The method remains stable under overexposure and long-range observation, and dual-AUV navigation experiments further validate its practical viability.

T-RO 2026-06-08

STTRL-DVO: Transformer-based Reinforcement Learning for Robust Dynamic Target Tracking in Cluttered Environments

Fanghao Wang, Binghong Chen, Youchao Zhang, Xiangyu Guo, Yining Lyu, Chuanjie Liu, et al.

操作与机械臂导航 / SLAM / 自动驾驶机器人学习感知与传感医疗 / 软体 / 微纳
摘要

Magnetic microrobots capable of autonomous operation hold significant promise for critical cell and small creature manipulation tasks, including trapping, transportation, sorting, etc. Although conventional microrobot navigation methods have shown notable performance, they often lack adaptability to novel environments. To address these limitations, we propose a learning-based framework for real-world microrobot navigation and dynamic target tracking. Our approach employs spatial-temporal transformer reinforcement learning (STTRL) with a deterministic velocity obstacle (DVO) that processes historical navigation states and virtual light detection and ranging (LiDAR) scans to predict optimal actions. The key innovation lies in the model's ability to extract and utilize contextual information from observation histories, enabling adaptive behavior even in previously unseen environments. Through large-scale model-free reinforcement learning trained in randomized simulation environments, our method achieves remarkable real-world performance with zero-shot transfer capability. Experimental results demonstrate superior navigation agility with an 89.8% success rate in the base environment, representing a 7.4% improvement over state-of-the-art (SOTA) algorithms. Furthermore, the method exhibits robust generalization in diverse unseen environments, validating its adaptability to different environmental characteristics.

AuRo 2026-06-10 · 被引 4

Large language models for multi-robot systems: a survey

Peihan Li, Zijian An, Shams Abrar, Lifeng Zhou

导航 / SLAM / 自动驾驶机器人学习多机器人 / 集群人机交互 / 遥操作
摘要

The rapid advancement of Large Language Models (LLMs) has opened new possibilities in Multi-Robot Systems (MRS), enabling enhanced communication, task allocation and planning, and human-robot interaction. Unlike traditional single-robot and multi-agent systems, MRS poses unique challenges, including coordination, scalability, and real-world adaptability. This survey provides the first dedicated review of LLM integration into MRS. It systematically categorizes their applications across high-level task allocation, mid-level motion planning, low-level action generation, and human intervention. We highlight key applications in diverse domains, such as household robotics, construction, formation control, target tracking, and robot games, showcasing the versatility and transformative potential of LLMs in MRS. Furthermore, we examine the challenges that limit adapting LLMs to MRS, including mathematical reasoning limitations, hallucination, latency issues, and the need for robust benchmarking systems. Finally, we outline opportunities for future research, emphasizing advancements in fine-tuning, reasoning techniques, and task-specific models. This survey aims to guide researchers in the intelligence and real-world deployment of MRS powered by LLMs. Given the rapidly evolving nature of research in the field, we continuously update the paper list in the open-source GitHub repository .

IJRR 2026-06-08

Particle importance analysis: A fast localization method under indeterminate magnetic sources

Ruoyu Xu, Yichong Sun, Yimou Wu, Yehui Li, Zheng Li

操作与机械臂导航 / SLAM / 自动驾驶医疗 / 软体 / 微纳
摘要

For magnetically actuated robots driven by external magnetic sources (EMSs), rapid and stable localization of the internal permanent magnet (IPM) is essential for closed-loop control. Existing sensor-array-based localization methods require a fixed model with an offline-calibrated dipole moment to separate the magnetic field generated by the IPM from the measured total field. However, model accuracy varies with the distance to the sensor array, leading to inaccurate separation and unstable localization. Additionally, nonlinear optimization is typically used for localization with sensor arrays, which is computationally inefficient. We propose a sampling-based method that jointly estimates the dipole moments and the IPM position, thereby eliminating the need for magnetic field separation. Given the EMS positions and the measured total field, our particle importance analysis efficiently identifies the sampled particle that most closely approximates the dipole position and concurrently estimates the optimal dipole moments. Adaptive particle sampling enables high-accuracy localization using sparsely sampled particles, thereby improving computational efficiency. We validated the approach experimentally with EMSs actuated by manipulators and electromagnetic systems. With a 72-sensor array and the IPM located 20 cm above the sensor plane, the method achieved an average computation time of 1.2 ms and localization errors of 4.70 mm and 4.17°. Compared with an optimization-based method, our method reduced the maximum localization error by 59% and improved computational efficiency by a factor of 30. The method reliably localized in permanent magnet, electromagnetic, and hybrid magnetic fields. Finally, we applied the method to the simultaneous localization and actuation of a magnetic endoscope, validating its feasibility in a colon phantom.

T-RO 2026-06-04

Fault-Tolerant Multi-Modal Localization of Multi-Robots on Matrix Lie Groups

Mahboubeh Zarei, Robin Chhabra

导航 / SLAM / 自动驾驶感知与传感多机器人 / 集群
摘要

Consistent localization of cooperative multi-robot systems during navigation presents substantial challenges. This paper proposes a fault-tolerant, multi-modal localization framework for multi-robot systems on matrix Lie groups. We introduce novel stochastic operations to perform composition, differencing, inversion, averaging, and fusion of correlated and non-correlated estimates on Lie groups, enabling pseudo-pose construction for filter updates. The method integrates a combination of proprioceptive and exteroceptive measurements from inertial, velocity, and pose (pseudo-pose) sensors on each robot in an Extended Kalman Filter (EKF) framework. The prediction step is conducted on the Lie group$\mathbb {SE}_{2}(3) \times \mathbb {R}^{3} \times \mathbb {R}^{3}$, where each robot's pose, velocity, and inertial measurement biases are propagated. The proposed framework uses body velocity, relative pose measurements from fiducial markers, and inter-robot communication to provide scalable EKF update across the network on the Lie group$\mathbb {SE}(3) \times \mathbb {R}^{3}$. A fault detection module is implemented, allowing the integration of only reliable pseudo-pose measurements from fiducial markers. We demonstrate the effectiveness of the method through experiments with a network of wheeled mobile robots equipped with inertial measurement units, wheel odometry, and ArUco markers. The comparison results highlight the proposed method's real-time performance, superior efficiency, reliability, and scalability in multi-robot localization, making it well-suited for large-scale robotic systems.

Sci. Robotics 2026-05-20

Extreme dynamic symmetry enables omnidirectional and multifunctional robots

Jiaxun Liu, Boxi Xia, Boyuan Chen

足式 / 四足机器人感知与传感控制与动力学
摘要

Symmetry is a central organizing principle in natural systems, yet its use as a unifying design strategy in robotics has largely remained limited to geometric form. We show that symmetry can instead be leveraged at the level of dynamic actuation capability. We introduce dynamic symmetry, the uniformity of a robot’s attainable center-of-mass accelerations, and formalize it through a measure coined as dynamic isotropy. Across more than 1000 simulated morphologies, we found that higher dynamic symmetry consistently improved trajectory tracking, task success, robustness, resiliency, and energy efficiency, with the benefits becoming most pronounced as dynamic isotropy approached its theoretical limit. To study this regime systematically, we developed Argus, a family of spherical robots designed to explore the effects of increasing dynamic symmetry. Members of the Argus family vary in their actuation geometry and dynamic symmetry level while sharing a common architectural principle: radially oriented linear actuators that directly shape the robot’s center-of-mass dynamics. Among them, we built a physical 20-leg Argus variant that achieved near-extreme dynamic isotropy and demonstrated orientation-invariant locomotion, agile traversal of cluttered and deformable terrain, rapid self-stabilization, and resilience to partial actuator failures. Its distributed sensing further enabled omnidirectional perception and object interaction during continuous motion. These results show that designing robots for symmetry not only in morphology but also in their attainable dynamics provides a powerful and general pathway toward agility, robustness, and multifunctionality in uncertain terrestrial and extraterrestrial environments.

Sci. Robotics 2026-05-20

A minimally invasive robotic spinal surgical system for anterior lumbar nerve decompression

Qingxiang Zhao, Xiandi Wang, Xin Zhong, Runfeng Zhu, Peizhi Zhou, Dan Pu, et al.

操作与机械臂导航 / SLAM / 自动驾驶机器人学习医疗 / 软体 / 微纳
摘要

Lumbar degenerative diseases, primarily caused by pathological tissues compressing spinal nerves, typically necessitate surgical intervention—specifically lumbar nerve decompression—to alleviate pain. Although the anterior decompression approach demonstrates notable advantages, such as reduced bleeding and shorter postoperative hospitalization stays, compared with the conventional posterior approach, patients may still experience incomplete decompression because of various instrumental shortcomings, including restricted visibility and insufficiency of distal dexterity. In this study, we present a robotic surgical system for minimally invasive anterior lumbar nerve decompression, which comprises three slender robotic arms (2 millimeters in outer diameter) with high dexterity (18 degrees of freedom), facilitating effective navigation through the narrow intervertebral disc space to reach the posterior area. Each robot arm is based on concentric push-pull robot structure, forming three robotized instruments: an endoscope for visualization, a laser optical fiber for hemostasis and resection, and a gripper for tissue manipulation. These components are integrated through the hollow lumen of a slender trocar, and multi-instrument coordination enables effective decompression procedure with wide view. System performance was first validated using a three-dimensional–printed vertebral phantom model to confirm accessibility to bilateral articular processes. Subsequently, in vivo animal experiment and human cadaver tests were conducted to further demonstrate the full capabilities in performing minimally invasive lumbar nerve decompression. This study demonstrates the potential of the robotic system to facilitate surgical procedures in narrow, confined, and tortuous anatomical spaces, addressing the key limitations of conventional instruments in anterior lumbar nerve decompression.

T-RO 2026-04-22 · 被引 1

Optimal Energy Shaping and Force Amplification Framework for Task-Agnostic, Biomimetic Ankle Exoskeletons

Katharine Walters, Gray C. Thomas, Robert D. Gregg

机器人学习医疗 / 软体 / 微纳控制与动力学
摘要

Task-agnostic controllers for backdrivable lower-limb exoskeletons aim to reliably mimic biological torque while seamlessly adapting to changing movement patterns. However, current approaches relying on hidden state estimators or neural networks lack explainability and safety guarantees, while force amplification methods risk instability with an inherent tradeoff between sensitivity and robustness to control inputs. Energy shaping control uses a kinematic model-based framework to provide predictable, stable assistance, though its traditional passive form limits biomimetic performance. Previous work relaxed the strict passivity requirements to improve biomimicry but reduced the stability guarantees. This paper presents an optimization-based extension of the energy-shaping control framework that combines the stability benefits of energy shaping with the intuitive biomimicry of force amplification. Our framework enables controlled trade-offs between sensitivity to changing human impedance and high performance through adjustable cost contributions of force amplification and model-based terms. We provide theoretical guarantees of closed-loop stability to an invariant set under human joint impedance control, supported by empirical validation of stability characteristics of an ankle exoskeleton under varying controller passivity constraints. A study of ten able-bodied participants using bilateral ankle exoskeletons demonstrates that the biomimetic controller reduced biological ankle torque by 18.7% across various activities of daily life.

IJRR 2026-06-18

Energy-optimal linear quadratic tracking control for unmanned underwater vehicles in offshore aquaculture fish net-pen visual inspection

Thein Than Tun, Loulin Huang, Mark Anthony Preece

控制与动力学
摘要

Unmanned underwater vehicles (UUVs) have been deployed for fish net-pen visual inspection (FNVI) in offshore aquaculture. Limited energy capacity of onboard power supplies constrains the UUV’s working range and operating time. To minimize the energy consumption by the UUV during the FNVI of the Blue Endeavour Project (an offshore salmon farm of the New Zealand King Salmon Company), an energy-optimal linear quadratic tracking (EO-LQT) control scheme is proposed in this paper. For EO-LQTs implementation, a new Linear-Parameter-Varying (LPV) system that approximates the nonlinear UUV dynamics model with an accuracy of approximately 99% regardless of the operating points in real-time, with the modified versions of Bhāskara I’s sine approximation and Shirali’s cosine approximation, is developed. The use of the Lagrangian under the Principle of Least Action with the UUV’s kinetic energy and the non-quadratic thruster power function in the EO-LQT performance index (PI) is demonstrated. The steps to solve the Hamilton-Jacobi-Bellman (HJB) equation with the non-quadratic Hamiltonian H are detailed to derive the new analytical EO-LQT optimal control form. Five EO-LQT controllers with different PIs are tested against the conventional LQT (CO-LQT) controller in both high-fidelity simulations under simulated disturbance speed up to 0.9 m/s and pool experiments, reducing energy consumption up to 37.1%. As key comparison metrics for the pose tracking and energy consumption, the mean-absolute-error (MAE) and T200 thruster power function are used to validate the effectiveness of the proposed EO-LQT controllers, compared to the CO-LQT controller.

IJRR 2026-06-12

2Fast-2Lamaa: Large-scale lidar-inertial localization and mapping with continuous distance fields

Cedric Le Gentil, Raphael Falque, Daniil Lisus, Timothy D. Barfoot

导航 / SLAM / 自动驾驶感知与传感
摘要

This paper introduces 2Fast-2Lamaa, a lidar-inertial state estimation framework for odometry, mapping, and localization. Its first key component is the optimization-based undistortion of lidar scans, which uses continuous IMU preintegration to model the system’s pose at every lidar point timestamp. The continuous trajectory over 100–200 ms is parameterized only by the initial scan conditions (linear velocity and gravity orientation) and IMU biases, yielding eleven state variables. These are estimated by minimizing point-to-line and point-to-plane distances between lidar-extracted features without relying on previous estimates, resulting in a prior-less motion-distortion correction strategy. Because the method performs local state estimation, it directly provides scan-to-scan odometry. To maintain geometric consistency over longer periods, undistorted scans are used for scan-to-map registration. The map representation employs Gaussian Processes to form a continuous distance field, enabling point-to-surface distance queries anywhere in space. Poses of the undistorted scans are refined by minimizing these distances through non-linear least-squares optimization. For odometry and mapping, the map is built incrementally in real time; for pure localization, existing maps are reused. The incremental map construction also includes mechanisms for removing dynamic objects. We benchmark 2Fast-2Lamaa on over 750 km of public and self-collected datasets from both automotive and handheld systems. The framework achieves state-of-the-art performance across diverse and challenging scenarios, reaching odometry and localization errors as low as 0.22% and 0.06 m, respectively. The real-time implementation is publicly available at https://github.com/clegenti/2fast2lamaa .

IJRR 2026-06-06

Linking exteroception and proprioception through improved contact modeling for soft growing robots

Francesco Fuentes, Serigne Diagne, Zachary Kingston, Laura H. Blumenschein

导航 / SLAM / 自动驾驶感知与传感医疗 / 软体 / 微纳
摘要

Passive deformation due to compliance is a commonly used benefit of soft robots, providing opportunities to achieve robust actuation with few active degrees of freedom. Soft growing robots in particular have shown promise in navigation of unstructured environments due to their passive deformation. If their collisions and subsequent deformations can be better understood, soft robots could be used to understand the structure of the environment from direct tactile measurements. In this work, we propose the use of soft growing robots as mapping and exploration tools. We do this by first characterizing collision behavior during discrete turns, then leveraging this model to develop a geometry-based simulator that models robot trajectories in 2D environments. Finally, we demonstrate the model and simulator validity by mapping unknown environments using Monte Carlo sampling to estimate the optimal next deployment given current knowledge. Over both uniform and non-uniform environments, this selection method rapidly approaches ideal actions, showing the potential for soft growing robots in unstructured environment exploration and mapping.

T-RO 2026-06-08

Cross-Behavior Learning with Object Flow Prediction for Robotic Manipulation

Longrui Chen, David Russell, Yulei Qiu, Mehmet Dogar

操作与机械臂机器人学习
摘要

Cross-embodiment learning enables robots to acquire manipulation skills by learning from demonstrations provided by different embodiments. However, most existing research on cross-embodiment learning focuses on transferring similar manipulation behaviors. For the same task, different embodiments may need differentbehaviors; e.g., it may be easier for a human to push an object to a goal position, while a robot may use a pick-and-place to perform the same task. Making use of such cross-embodiment and cross-behavior demonstrations becomes essential for large-scale imitation learning. In this work, we propose a novel framework for cross-behavior learning based on object flow. Object flow represents the task in a manner that is independent from the embodiment and the particular behavior used during the demonstration. By shifting the focus from the manipulator to objects, our framework enables learning from cross-behavior manipulation data rather than merely imitating the behavior of a specific embodiment. Our results on task representations show that, with only 20 robot demonstrations, integrating object flow prediction improves success rates by up to 91% in-domain and 87% out-of-domain. Adding 40 human demonstrations in addition to the 20 robot demonstrations further boosts out-of-domain performance by 143%.

T-RO 2026-06-02

LiDAR Teach, Radar Repeat: Robust Cross-Modal Navigation in Degenerate and Varying Environments

Renxiang Xiao, Yichen Chen, Yuanfan Zhang, Qianyi Shao, Yushuai Chen, Yuxuan Han, et al.

导航 / SLAM / 自动驾驶感知与传感控制与动力学
摘要

Long-term autonomy requires robust navigation in environments subject to dynamic and static changes, as well as adverse weather conditions. Teach-and-Repeat (T&R) navigation offers a reliable and cost-effective solution by avoiding the need for consistent global mapping; however, existing T&R systems lack a systematic solution to tackle various environmental variations such as weather degradation, ephemeral dynamics, and structural changes. This work proposes LTR$^{2}$, the first cross-modal, cross-platform LiDAR-Teach-and-Radar-Repeat system that systematically addresses these challenges. LTR$^{2}$leverages LiDAR during the teaching phase to capture precise structural information under normal conditions and utilizes 4D millimeter-wave radar during the repeating phase for robust operation under environmental degradations. To align sparse and noisy forward-looking 4D radar with dense and accurate omnidirectional 3D LiDAR data, we introduce a Cross-Modal Registration (CMR) network that jointly exploits Doppler-based motion priors and the physical laws governing LiDAR intensity and radar power density. Furthermore, we propose an adaptive fine-tuning strategy that incrementally updates the CMR network based on localization errors, enabling long-term adaptability to static environmental changes without ground-truth labels. We demonstrate that the proposed CMR network achieves state-of-the-art cross-modal registration performance on the open-access dataset. Then we validate LTR$^{2}$across three robot platforms over a large-scale, long-term deployment (40+ km over 6 months), including challenging conditions such as nighttime smoke. Experimental results and ablation studies demonstrate centimeter-level accuracy and strong robustness against diverse environmental disturbances, significantly outperforming existing approaches.

T-RO 2026-06-02

MaskPlanner: A Framework for 3D Learning-Based Object-Centric Motion Generation and Applications to Robotic Spray Painting

Gabriele Tiboni, Raffaello Camoriano, Tatiana Tommasi

导航 / SLAM / 自动驾驶机器人学习感知与传感
摘要

Object-Centric Motion Generation (OCMG) plays a key role in a variety of industrial applications—such as robotic spray painting and welding—requiring efficient, scalable, and generalizable algorithms to plan multiple long-horizon trajectories over free-form 3D objects. However, existing solutions rely on specialized heuristics, expensive optimization routines, or restrictive geometry assumptions that limit their adaptability to real-world scenarios. In this work, we introduce a novel, fully data-driven framework that tackles OCMG directly from 3D point clouds, learning to generalize expert path patterns across free-form surfaces. We proposeMaskPlanner, a deep learning method that predicts local path segments for a given object while simultaneously inferring “path masks” to group these segments into distinct paths. This design induces the network to capture both local geometric patterns and global task requirements in a single forward pass. Extensive experimentation on a realistic robotic spray painting scenario shows that our approach attains near-complete coverage (above 99%) for unseen objects, while it remains task-agnostic and does not explicitly optimize for paint deposition. Moreover, our real-world validation on a 6-DoF specialized painting robot demonstrates that the generated paths are directly executable and yield expert-level painting quality. We additionally provide empirical evidence that our approach remains complementary to downstream trajectory optimization methods, and applicable to tasks beyond spray painting.

T-RO 2026-04-21 · 被引 1

Multi-Robot Target Monitoring and Encirclement via Triggered Distributed Feedback Optimization

Lorenzo Pichierri, Guido Carnevale, Lorenzo Sforni, Giuseppe Notarstefano

无人机 / 空中机器人多机器人 / 集群控制与动力学
摘要

We design a distributed feedback optimization strategy, embedded into a modular ROS 2 control architecture, which allows a team of heterogeneous robots to cooperatively monitor and encircle a target while patrolling points of interest. By relying on the aggregative feedback optimization framework, we handle multi-robot dynamics while minimizing a global performance index depending on both microscopic (e.g., the location of single robots) and macroscopic variables (e.g., the spatial distribution of the team). The proposed distributed policy allows the robots to cooperatively address the global problem by employing only local measurements and neighboring data exchanges. These exchanges are performed through an asynchronous communication protocol ruled by locally-verifiable triggering conditions. We formally prove that our strategy steers the robots to a set of configurations representing stationary points of the considered optimization problem. The effectiveness and scalability of the overall strategy are tested via Monte Carlo campaigns of realistic Webots ROS 2 virtual experiments. Finally, the applicability of our solution is shown with real experiments on ground and aerial robots.

IJRR 2026-06-03

Parkour in the wild: Learning a general and extensible agile locomotion policy using multi-expert distillation and RL fine-tuning

Nikita Rudin, Junzhe He, Joshua Aurand, Marco Hutter

足式 / 四足机器人导航 / SLAM / 自动驾驶机器人学习
摘要

Legged robots are well-suited for navigating terrains inaccessible to wheeled robots, making them ideal for applications in search and rescue or space exploration. However, current control methods often struggle to generalize across diverse, unstructured environments. This paper introduces a novel framework for agile locomotion of legged robots by combining multi-expert distillation with reinforcement learning (RL) fine-tuning to achieve robust generalization. Initially, terrain-specific expert policies are trained to develop specialized locomotion skills. These policies are then distilled into a unified generalist policy via the DAgger algorithm. The distilled policy is subsequently fine-tuned using RL on a broader terrain set, including real-world 3D scans. The framework allows further adaptation to new terrains through repeated fine-tuning. The proposed policy leverages depth images as exteroceptive inputs, enabling robust navigation across diverse, unstructured terrains. Experimental results demonstrate significant performance improvements over existing methods in synthesizing multi-terrain skills into a single controller. Deployment on the ANYmal D robot validates the policy’s ability to navigate complex environments with agility and robustness, setting a new benchmark for legged robot locomotion.

IJRR 2026-06-08

TPT-Bench: A large-scale, long-term and robot-egocentric dataset for benchmarking target person tracking

Hanjing Ye, Yu Zhan, Weixi Situ, Guangcheng Chen, Jingwen Yu, Ziqi Zhao, et al.

导航 / SLAM / 自动驾驶感知与传感
摘要

Tracking a target person from robot-egocentric views is crucial for developing autonomous robots that provide continuous personalized assistance or collaboration in human–robot interaction (HRI) and Embodied AI. However, most existing target person tracking (TPT) benchmarks are limited to controlled laboratory environments with few distractions, clean backgrounds, and short-term occlusions. In this paper, we introduce a large-scale dataset designed for TPT in crowded and unstructured environments, demonstrated through a robot-person following task. The dataset is collected by a human pushing a sensor-equipped cart while following a target person, capturing human-like following behavior and emphasizing long-term tracking challenges, including frequent occlusions and the need for re-identification from numerous pedestrians. It includes multi-modal data streams, including odometry, 3D LiDAR, IMU, panoramic images, and RGB-D images, along with exhaustively annotated 2D bounding boxes of the target person across 48 sequences, both indoors and outdoors. Using this dataset and visual annotations, we perform extensive experiments with existing SOTA TPT methods, offering a thorough analysis of their limitations and suggesting future research directions. Our dataset, code, and video are available at https://medlartea.github.io/tpt-bench/ .

T-RO 2026-06-09

A Third-Order Gaussian Process Trajectory Representation Framework with Closed-Form Kinematics for Continuous-Time Motion Estimation

Thien-Minh Nguyen, Ziyu Cao, Kailai Li, William Talbot, Tongxing Jin, Shenghai Yuan, et al.

导航 / SLAM / 自动驾驶
摘要

In this paper, we propose a third-order, i.e., white-noise-on-jerk, Gaussian Process (GP) Trajectory Representation (TR) framework for continuous-time (CT) motion estimation (ME) tasks. Our framework features a unified trajectory representation that encapsulates the kinematic models of both SO(3)$times$R3and SE(3) pose representations. This encapsulation strategy allows users to use the same implementation of measurement-based factors for either choice of pose representation, which facilitates experimentation and comparison to make a better choice for the ME task. In addition, unique to our framework, we derive the kinematic models with theclosed-form temporal derivatives of the local variables ofSO(3) and SE(3), which so far has only been approximated based on Taylor expansion in the literature. Our experiments show that these kinematic models can improve the estimation accuracy in high-speed scenarios. All analytical Jacobians of the interpolated states with respect to the support states of the trajectory representation, as well as the motion prior factors, are also provided for accelerated Gauss-Newton (GN) optimization. Our experiments demonstrate the efficacy and efficiency of the framework in various motion estimation tasks such as localization, calibration, and odometry, facilitating fast prototyping for ME researchers. We release the source code for the benefit of the community. Our project is available athttps://github.com/brytsknguyen/gptr.

IJRR 2026-06-06

Control of the uncertain fully flexible link-joint robot manipulators: A free-drift adaptive fractional-order robust approach

Seyed Jalal Aldin Hoseini, Mazda Moattari, Saeed Zaare

操作与机械臂控制与动力学
摘要

This paper presents a fast and low-vibration tracking control strategy for an uncertain fully flexible link-joint (FFLJ) robot manipulator. Due to the highly underactuated nature of the system, along with the presence of uncertainties and external disturbances, a two-time-scale singular perturbation (SP) approach is adopted to decompose the system into slow and fast subsystems. To control the slow subsystem, a fractional-order fast terminal sliding mode control (FOFTSMC) is designed, ensuring rapid convergence with minimal transient and steady-state errors, which is essential for vibration suppression. Additionally, a free-drift partially adaptive super twisting reaching law is incorporated to prevent overestimation of control inputs, mitigate uncertainties and disturbances, and reduce chattering while optimizing energy efficiency. For the fast subsystem, a linear state-space representation is formulated based on the slow subsystem’s control input, explicitly considering uncertainties. An optimal proportional derivative linear quadratic regulator (PD-LQR) is then employed to regulate the fast subsystem dynamics, leading to a robust composite control scheme. A rigorous stability analysis guarantees the global asymptotic stability of both subsystems and the overall closed-loop control system. Simulation results confirm the effectiveness of the proposed strategy in handling nonlinearities, underactuation, and uncertainties. Compared to the FOFTSMC and the integer order robust fuzzy SMC (IORFSMC), the proposed free-drift adaptive FOFTSMC (AFOFTSMC) demonstrates superior performance, achieving approximately 45% and 85% improvement, respectively, in the first link tracking, and 4% and 65% improvement, respectively, in the second link tracking in terms of the integral of time multiplied by absolute error (ITAE) index. These results highlight the proposed approach’s robustness, efficiency, and capability to ensure precise trajectory tracking, vibration suppression, and chattering reduction while maintaining energy efficiency.

Sci. Robotics 2026-05-13

Autonomous seeking and mapping coral reef biodiversity hotspots with a multimodal AUV

Seth McCammon, Levi Cai, Daniel Yang, John Walsh, John D. Cast, T. Aran Mooney, et al.

导航 / SLAM / 自动驾驶机器人学习控制与动力学
摘要

Coral reefs are under threat worldwide. Yet, in these often-challenging marine habitats, biologists lack the tools to quantify the spatial heterogeneity of mobile reef fauna at high resolution (<1 meter), hampering monitoring and restoration efforts. Because of the diverse species present, multiple sensor modalities are needed to characterize the biodiversity. The urgent need to expand this study across unexplored reefs worldwide compounds these challenges. To address these problems, we propose a generative model of reef observations and develop a multimodal framework for seeking and mapping hotspots of biodiversity. In a case study on a healthy Caribbean reef, our autonomous underwater vehicle used passive acoustics and visual sensing to locate a biological hotspot around a large Dendrogyra pillar coral. We used the colocated multimodal data to self-validate the hotspot’s prominence, representing a technological step forward to help understand the ecological dynamics of coral reefs.

T-RO 2026-05-26

Canonical Policy: Learning Canonical 3-D Representation for $\text{SE}(3)$-Equivariant Policy

Zhiyuan Zhang, Zhengtong Xu, Jai Nanda Lakamsani, Yu She

操作与机械臂导航 / SLAM / 自动驾驶机器人学习感知与传感
摘要

Visual imitation learning has achieved remarkable progress in robotic manipulation, yet generalization to unseen objects, scene layouts, and camera viewpoints remains a key challenge. Recent advances address this by using 3D point clouds, which provide geometry-aware, appearance-invariant representations, and by incorporating equivariance into policy architectures to exploit spatial symmetries. However, existing equivariant approaches often lack interpretability and rigor due to unstructured integration of equivariant components. We introduce canonical policy, a principled framework for 3D equivariant imitation learning that unifies 3D point cloud observations under a canonical representation. We first establish a theory of 3D canonical representations, enabling equivariant observation-to-action mappings by grouping both seen and novel point clouds to a canonical representation. We then propose a flexible policy learning pipeline that leverages geometric symmetries from canonical representation and the expressiveness of modern generative models. We validate canonical policy on 12 diverse simulated tasks and 4 real-world manipulation tasks across 16 configurations, involving variations in object color, shape, camera viewpoint, and robot platform. Compared to state-of-the-art imitation learning policies, canonical policy achieves an average improvement of 18.0% in simulation and 39.7% in real-world experiments, demonstrating superior generalization capability and sample efficiency.

IJRR 2026-05-27

Actuator fault recovery with deep reinforcement learning in a linear model-based control framework: Application to a physical AUV

Katell Lagattu, Ramy Alham, Thomas Chaffre, Eva Artusi, Paulo E. Santos, Gilles Le Chenadec, et al.

导航 / SLAM / 自动驾驶机器人学习控制与动力学
摘要

Actuator faults in autonomous mobile robotic systems pose significant challenges, especially in unpredictable environments where system reliability is paramount. Fault tolerant control (FTC) strategies, particularly those leveraging actuator redundancy, have been explored to address these issues. However, traditional methods commonly rely on explicit fault diagnosis, which can be resource-intensive and challenging to implement accurately. This paper introduces a novel approach that combines deep reinforcement learning (DRL) with a linearised optimal model-based controller to achieve actuator fault recovery without explicit fault diagnosis. The integration of DRL within a model-based controller framework enhances system stability and fault recovery capabilities. The chosen application platform for this study is an autonomous underwater vehicle (AUV), where the partial or total failure of a mission critical component such as a thruster could jeopardise the success of the mission and potentially render the vehicle unrecoverable in the event of a fault. In this application case, a linear quadratic regulator (LQR) controller is employed as the model-based controller, while the soft actor-critic (SAC) algorithm is used as the DRL component to handle fault recovery. The DRL model is trained and evaluated in simulation before being directly applied to the physical AUV. The proposed method’s effectiveness is demonstrated through comparisons with a standard LQR controller, a conventional adaptive LQR controller and the proposed hybrid LQR-SAC controller. The results indicate that the LQR-SAC controller outperforms the standard and conventional adaptive LQR controllers in maintaining system performance under fault conditions, achieving a significant reduction in trajectory tracking error on a physical AUV.

RA-L 2026-05-11 · 被引 3

Neural Power-Optimal Magnetorquer Solution for Multi-Agent Formation and Attitude Control

Yuta Takahashi, Shin-ichiro Sakai

机器人学习多机器人 / 集群
摘要

This paper presents a learning-based current calculation framework to achieve power-optimal magnetic-field interaction for multi-agent formation and attitude control. In aerospace engineering, electromagnetic coils are referred to as magnetorquer and used as satellite attitude actuators in Earth's orbit and for long-term formation and attitude control. This study derives a unique, continuous, and power-optimal current solution via sequential convex programming and approximates it using a multilayer perceptron model. The effectiveness of our strategy was demonstrated through numerical simulations and experimental trials on the formation and attitude control.

T-RO 2026-05-22

Hidden Markov Model-Based Shared Autonomy for Grip Strength Regulation in sEMG Driven Robot Hand Control

Alessandra Bernardini, Roberto Meattini, Alex Pasquali, Gianluca Laudante, Cosimo Gentile, Emanuele Gruppioni, et al.

人形机器人操作与机械臂感知与传感人机交互 / 遥操作
摘要

The integration of robots into human environments is advancing rapidly, driven by the demand for systems that combine robot accuracy and repeatability with human flexibility and adaptability. In this context, human-centered manipulation applications must address uncertainties arising from human, robotic, and environmental factors. As a result, effective robotic manipulation requires both accurate pre-grasping motions and precise grip strength control, especially in tasks where robotic devices are remotely controlled to grip objects with fine, desired, and adjustable grip force. The present work tackles the challenges posed by uncertainties and non-ideal conditions in surface electromyography (sEMG)-driven human-in-the-loop (HITL) robot hand control applications. In this regard, a novel probabilistic shared autonomy framework for fine grip strength regulation is introduced, leveraging Hidden Markov Models (HMMs) applied to tactile data to encode the HITL grasping action into proper, probabilistically consistent phases. These phases are then exploited to modulate the level of shared autonomy between the human operator and the robot hand, enabling precise control over grip strength. The presented shared autonomy framework was evaluated under multiple experimental conditions, testing grip force regulation performance with a group of 10+10 intact-limb participants and a participant with amputation in both static (fixed hand-object configuration) and dynamic (pick-and-place and recipe preparation) grasping tasks, with differentiated goals inspired by real-world requirements. Moreover, to explore generalizability, experiments were conducted with both anthropomorphic and industrial robotic hands, properly equipped with tactile sensors. Experimental outcomes are supported by statistical analysis and show, for the considered sample, the effectiveness of the proposed shared autonomy control architecture in achieving fine, smooth, and controllable grip strength regulation with respect to the baseline case in absence of our approach.

RA-L 2026-06-16

SAGA: Open-World Mobile Manipulation via Structured Affordance Grounding

Kuan Fang, Yuxin Chen, Xinghao Zhu, Farzad Niroui, Lingfeng Sun, Jiuguang Wang

人形机器人足式 / 四足机器人操作与机械臂机器人学习
摘要

We present SAGA, a versatile and adaptive framework for visuomotor control that can generalize across various environments, task objectives, and user specifications. To efficiently learn such capability, our key idea is to disentangle high-level semantic intent from low-level visuomotor control by explicitly grounding task objectives in the observed environment. Using an affordance-based task representation, we express diverse and complex behaviors in a unified, structured form. By leveraging multimodal foundation models, SAGA grounds the proposed task representation to the robot's visual observation as 3D affordance heatmaps, highlighting task-relevant entities while abstracting away spurious appearance variations that would hinder generalization. These grounded affordances enable us to effectively train a conditional policy on multi-task demonstration data for whole-body control. In a unified framework, SAGA can solve tasks specified in different forms, including language instructions, selected points, and example demonstrations, enabling both zero-shot execution and few-shot adaptation. We instantiate SAGA on a quadrupedal manipulator and conduct extensive experiments across eleven real-world tasks. SAGA consistently outperforms end-to-end and modular baselines by substantial margins. Together, these results demonstrate that structured affordance grounding offers a scalable and effective pathway toward generalist mobile manipulation.

RA-L 2026-06-16

DynaMimicGen: A Data Generation Framework for Robot Learning of Dynamic Tasks

Vincenzo Pomponi, Paolo Franceschi, Stefano Baraldo, Oliver Avram, Loris Roveda, Luca Maria Gambardella, et al.

操作与机械臂机器人学习感知与传感
摘要

Learning robust manipulation policies typically requires large and diverse datasets, the collection of which is time-consuming, labor-intensive, and often impractical for dynamic environments. In this work, we introduce DynaMimicGen (D-MG), a scalable dataset generation framework that enables policy training from minimal human supervision while uniquely supporting dynamic task settings. Given only a few human demonstrations, D-MG first segments the demonstrations into meaningful sub-tasks, then leverages Dynamic Movement Primitives (DMPs) to adapt and generalize the demonstrated behaviors to novel and dynamically changing environments. Improving prior methods that rely on static assumptions or simplistic trajectory interpolation, D-MG produces smooth, realistic, and task-consistent Cartesian trajectories that adapt in real time to changes in object poses, robot states, or scene geometry during task execution. Our method supports different scenarios - including scene layouts, object instances, and robot configurations - making it suitable for both static and highly dynamic manipulation tasks. We show that robot agents trained via imitation learning on D-MG-generated data achieve strong performance across long-horizon and contact-rich benchmarks, including tasks like cube stacking and placing mugs in drawers, even under unpredictable environment changes. By eliminating the need for extensive human demonstrations and enabling generalization in dynamic settings, D-MG offers a powerful and efficient alternative to manual data collection, paving the way toward scalable, autonomous robot learning.

Sci. Robotics 2026-05-13

RAPTOR: A foundation policy for quadrotor control

Jonas Eschmann, Dario Albani, Giuseppe Loianno

无人机 / 空中机器人机器人学习
摘要

Humans are remarkably data efficient when adapting to previously unseen conditions, like driving a new car. In contrast, modern robotic control systems, like neural network policies trained using reinforcement learning (RL), are highly specialized for single environments. Because of this overfitting, they are known to break down even under small differences like the simulation-to-reality gap and require system identification and retraining for even minimal changes to the system. Here, we present RAPTOR, a method for training a highly adaptive foundation policy for quadrotor control. Our method enables training a single, end-to-end neural network policy to control a wide variety of quadrotors. We tested 10 different real quadrotors, from 32 grams to 2.4 kilograms, that also differed in motor type (brushed versus brushless), frame type (soft versus rigid), propeller type (two, three, or four blades), and flight controller (PX4, Betaflight, Crazyflie, M5StampFly). We found that a tiny, three-layer policy with only 2084 parameters was sufficient for zero-shot adaptation to a wide variety of platforms. The adaptation through in-context learning was made possible by using a recurrence in the hidden layer. The policy was trained through our proposed meta-imitation learning algorithm, where we sampled 1000 quadrotors and trained a teacher policy for each of them using RL. The 1000 teachers were distilled into a single, adaptive student policy. We found that within milliseconds, the resulting foundation policy adapted zero-shot to unseen quadrotors. We tested the capabilities of the foundation policy under numerous conditions (trajectory tracking, indoor/outdoor, wind disturbance, poking, and different propellers).

RA-L 2026-06-15

DPL: Depth-only Perceptive Humanoid Locomotion via Realistic Depth Synthesis and Cross-Attention Terrain Reconstruction

Jingkai Sun, Gang Han, Pihai Sun, Wen Zhao, Jiahang Cao, Jiaxu Wang, et al.

人形机器人足式 / 四足机器人导航 / SLAM / 自动驾驶机器人学习感知与传感
摘要

Recent advancements in legged robot perceptive locomotion have shown promising progress. However, terrain-aware humanoid locomotion remains largely constrained to two paradigms: depth image-based end-to-end learning and elevation map-based methods. The former suffers from limited training efficiency and a significant sim-to-real gap in depth perception, while the latter depends heavily on multiple vision sensors and localization systems, resulting in latency and reduced robustness. To overcome these challenges, we propose a novel framework that tightly integrates three key components: (1) Terrain-Aware Locomotion Policy with a Blind Backbone, which leverages pre-trained elevation map-based perception to guide reinforcement learning with minimal visual input; (2) Multi-Modality Cross-Attention Transformer, which reconstructs structured terrain representations from noisy depth images; (3) Realistic Depth Images Synthetic Method, which employs self-occlusion-aware ray casting and noise-aware modeling to synthesize realistic depth observations, achieving over 30% reduction in terrain reconstruction error. This combination enables efficient policy training with limited data and hardware resources, while preserving critical terrain features essential for generalization. We validate our framework on a full-sized humanoid robot, demonstrating agile and adaptive locomotion across diverse and challenging terrains.

RA-L 2026-06-15

Pixel2Catch: Multi-Agent Sim-to-Real Transfer for Agile Manipulation with a Single RGB Camera

Seongyong Kim, Junhyeon Cho, Kang-Won Lee, Soo-Chul Lim

操作与机械臂机器人学习多机器人 / 集群
摘要

To catch a thrown object, a robot must be able to perceive the object's motion and generate control actions in a timely manner. Rather than explicitly estimating the object's 3D position, this work focuses on a novel approach that recognizes object motion using pixel-level visual information extracted from consecutive RGB frames. Such visual cues capture changes in the object's position and scale, allowing the policy to reason about the object's motion. Furthermore, to achieve stable learning in a high-DoF system composed of a robot arm equipped with a multi-fingered hand, we design a heterogeneous multi-agent reinforcement learning framework that defines the arm and hand as independent agents with distinct roles. Each agent is trained cooperatively using role-specific observations and rewards, and the learned policies are successfully transferred from simulation to the real world. Project page:https://seongdrgn.github.io/pixel2catch/

RA-L 2026-06-15

High-Speed Vision-Based Flight in Clutter with Safety-Shielded Reinforcement Learning

Jiarui Zhang, Chengyong Lei, Chengjiang Dai, Kenghou Hoi, Lijie Wang, Zhichao Han, et al.

无人机 / 空中机器人导航 / SLAM / 自动驾驶机器人学习感知与传感
摘要

Quadrotor unmanned aerial vehicles (UAVs) are increasingly deployed in complex missions that demand reliable autonomous navigation and robust obstacle avoidance. However, traditional modular pipelines often incur cumulative latency, whereas purely reinforcement learning (RL) approaches typically provide limited formal safety guarantees. To bridge this gap, we propose an end-to-end RL framework augmented with model-based safety mechanisms. We incorporate physical priors in both training and deployment. During training, we design a physics-informed reward structure that provides global navigational guidance. During deployment, we integrate a real-time safety filter that projects the policy outputs onto a provably safe set to enforce strict collision-avoidance constraints. This hybrid architecture reconciles high-speed flight with robust safety assurances. Benchmark evaluations demonstrate that our method outperforms both traditional planners and recent end-to-end obstacle avoidance approaches based on differentiable physics. Extensive experiments demonstrate strong generalization, enabling reliable high-speed navigation in dense clutter and challenging outdoor forest environments at velocities up to$7.5 \rm{ m/s}$.

RA-L 2026-06-15

Collision-Free Humanoid Traversal in Cluttered Indoor Scenes

Han Xue, Sikai Liang, Zhikai Zhang, Zicheng Zeng, Yun Liu, Yunrui Lian, et al.

人形机器人导航 / SLAM / 自动驾驶机器人学习感知与传感人机交互 / 遥操作
摘要

We study the problem of collision-free humanoid traversal in cluttered indoor scenes, such as hurdling over objects scattered on the floor, crouching under low-hanging obstacles, or squeezing through narrow passages. To achieve this goal, the humanoid needs to map its perception of surrounding obstacles with diverse spatial layouts and geometries to the corresponding traversal skills. However, the lack of an effective representation that captures humanoid–obstacle relationships during collision avoidance makes directly learning such mappings difficult. We therefore propose Humanoid Potential Field (HumanoidPF), which encodes these relationships as collision-free motion directions, significantly facilitating RL-based traversal skill learning. We also find that HumanoidPF exhibits a surprisingly negligible sim-to-real gap as a perceptual representation. To further enable generalizable traversal skills through diverse and challenging cluttered indoor scenes, we further propose a hybrid scene generation method, incorporating crops of realistic 3D indoor scenes and procedurally synthesized obstacles. We successfully transfer our policy to the real world and develop a teleoperation system where users could command the humanoid to traverse in cluttered indoor scenes with just a single click. Extensive experiments are conducted in both simulation and the real world to validate the effectiveness of our method.

IJRR 2026-05-24

NavWareSet: A dataset of socially compliant and non-compliant robot navigation

Johnata Brayan, Sihao Deng, Armando Alves Neto, Iaroslav Okunevich, Tomas Krajnik, Francois Bremond, et al.

导航 / SLAM / 自动驾驶感知与传感人机交互 / 遥操作
摘要

This paper presents NavWareSet , a novel dataset crafted to advance socially compliant robot navigation research. NavWareSet provides multi-modal recordings of both socially compliant and non-compliant robot trajectories in controlled indoor environments. Drawing upon seven carefully selected scenarios, it captures complex human-robot interactions and a range of navigation challenges that mirror realistic social contexts. NavWareSet establishes a rich dataset for evaluating and training navigation algorithms by incorporating two distinct robot platforms—Toyota Human Support Robot (HSR) and Clearpath’s Jackal—and systematically varying their navigation behaviors. With data modalities spanning lidar, RGB-D camera, odometry, and human position annotations, NavWareSet enables fine-grained analysis of the robot’s decision-making process and its impact on human comfort and safety. Ultimately, this dataset provides a versatile resource for developing robust, ethically guided navigation policies and for measuring their performance across a range of social situations. More information can be seen at: https://anr-navware.github.io/navwareset/ .

T-RO 2026-05-26

Pushing Physical Limits and Uncovering Motion Templates of Spine-Based Quadruped Locomotion via Reinforcement Learning

Zhenshan Bing, Yulong Xiao, Yuhong Huang, Qing Shi, Long Cheng, Biao Hu, et al.

足式 / 四足机器人机器人学习
摘要

Flexible spines are critical to the remarkable agility and speed of animals. Translating this biological advantage to quadruped robots presents a significant control challenge, particularly in coordinating the spine and limbs for maximal velocity. In this work, we utilize reinforcement learning (RL) to develop high-speed locomotion for a bioinspired mouse robot with a lateral flexible spine. The resulting controller achieves motor performance that demonstrably surpasses non-spined and model-based methods. More importantly, our analysis reveals the principles behind this performance: the emergence of two distinct motion templates. For high-speed walking, the robot learns a “whip-like” spinal oscillation to increase leg swing frequency, while for agile turning, it adopts a dynamic “bend-and-straighten” pattern. These findings demonstrate the capability of RL to not only generate high-performance controllers but also to produce emergent strategies that, upon analysis, reveal underlying principles of high-speed, spine-driven locomotion.

T-RO 2026-05-26

Globally Consistent RGB-D SLAM With 2-D Gaussian Splatting

Xingguang Zhong, Yue Pan, Liren Jin, Marija Popović, Jens Behley, Cyrill Stachniss

导航 / SLAM / 自动驾驶感知与传感
摘要

Recently, 3D Gaussian splatting-based RGB-D SLAM displays remarkable performance of high-fidelity 3D reconstruction. However, 3D Gaussian splatting (3DGS) suffers from a lack of depth rendering consistency, which leads to suboptimal geometric reconstruction in 3DGS-based SLAM. In addition, 3DGS-based SLAM methods typically lack efficient loop closure, limiting their ability to build globally consistent maps online. In this paper, we present 2DGS-SLAM, an RGB-D SLAM system using 2D Gaussian splatting as map representation. By leveraging the depth-consistent rendering property of the 2D variant, we propose an accurate camera pose optimization method and achieve geometrically accurate 3D reconstruction. In addition, we implement efficient loop detection and camera relocalization by leveraging MASt3R, a feed-forward 3D reconstruction model, and achieve efficient map updates by maintaining a local active map. Experiments show that our 2DGS-SLAM approach achieves superior tracking accuracy, higher surface reconstruction quality, and more consistent global map reconstruction compared to existing rendering-based SLAM methods, while maintaining high-fidelity image rendering and improved computational efficiency.

T-RO 2026-05-26

The Unreasonable Effectiveness of Discrete-Time Gaussian Process Mixtures for Robot Policy Learning

Jan Ole von Hartz, Adrian Röfer, Joschka Boedecker, Abhinav Valada

操作与机械臂机器人学习
摘要

We present Mixture of Discrete-time Gaussian Processes (MiDiGaP), a novel approach for flexible policy representation and imitation learning in robot manipulation. MiDiGaP enables learning from as few as five demonstrations using only camera observations and generalizes across a wide range of challenging tasks. It excels at long-horizon behaviors such as making coffee, highly constrained motions such as opening doors, dynamic actions such as scooping with a spatula, and multimodal tasks such as hanging a mug. MiDiGaP learns these tasks on a CPU in less than a minute and scales linearly to large datasets. We also develop a rich suite of tools for inferencetime steering using evidence such as collision signals and robot kinematic constraints. This steering enables novel generalization capabilities, including obstacle avoidance and cross-embodiment policy transfer. MiDiGaP achieves state-of-the-art performance on diverse few-shot manipulation benchmarks. On constrained RLBench tasks, it improves policy success by 76 percentage points and reduces trajectory cost by 67%. On multimodal tasks, it improves policy success by 48 percentage points while improving sample efficiency 7-fold. In cross-embodiment transfer, it more than doubles policy success. We make the code publicly available.

T-RO 2026-05-26

A Caterpillar-Type Miniature Robot for Adaptive Locomotion and Exploration of Tiny Rigid/Soft Pipes

Jinyang Gao, Zhengtao Hu, Jingyao Zhang, Yanfei Cao, Kunpeng Zhang, Guozheng Yan, et al.

足式 / 四足机器人导航 / SLAM / 自动驾驶
摘要

Caterpillar-type robots are widely used for medium- and large-sized pipe inspections. However, existing prototypes smaller than 80 mm lack both an active variable diameter capability and a contact force sensing function, which are crucial for safe and automatic exploration of unknown rigid/soft pipes (e.g., the colon). This study develops a variable diameter caterpillar-type miniature robot (VCMR) featuring a small size of Φ34.6 mm × 41 mm, a large variable-diameter range of 34.6-89.6 mm, and an integrated contact force sensing function. The VCMR actively adapts to pipe diameter changes using contact force feedback, demonstrates high load capacity in both vertical and horizontal rigid/soft pipes, and traverses a 150-cm colon phantom with sharp bends at an average velocity of 3.07 ± 0.48 cm/s. It holds promise for exploring tiny variable-diameter rigid/soft pipes and delivering cargoes through such pipes.

T-RO 2026-05-26

Event-Based De-Snowing for Autonomous Driving

Manasi Muglikar, Nico Messikommer, Marco Cannici, Davide Scaramuzza

导航 / SLAM / 自动驾驶感知与传感
摘要

Adverse weather conditions, particularly heavy snowfall, pose significant challenges to both human drivers and autonomous vehicles. Traditional image-based desnowing methods often introduce hallucination artifacts as they rely solely on spatial information, while video-based approaches require high frame rates and suffer from alignment artifacts at lower frame rates. Camera parameters, such as exposure time, also influence the appearance of snowflakes, making the problem difficult to solve and heavily dependent on network generalization. In this paper, we propose to address the challenge of desnowing by using event cameras, which offer compressed visual information with submillisecond latency, making them ideal for desnowing images, even in the presence of ego-motion. Our method leverages the fact that snowflake occlusions appear with a very distinctive streak signature in the spatiotemporal representation of event data. We design an attention-based module that focuses on events along these streaks to determine when a background point was occluded and use this information to recover its original intensity. We benchmark our method on DSEC-Snow, a new dataset created using a green-screen technique that overlays pre-recorded snowfall data onto the existing DSEC driving dataset, resulting in precise ground truth and synchronized image and event streams. Our approach outperforms state-of-the-art desnowing methods by 3 dB in PSNR for image reconstruction. Moreover, we show that off-the-shelf computer vision algorithms can be applied to our reconstructions for tasks such as depth estimation and optical flow, achieving a 20% performance improvement over other desnowing methods. Our work represents a crucial step towards enhancing the reliability and safety of vision systems in challenging winter conditions, paving the way for more robust, all-weather-capable applications.

Sci. Robotics 2026-03-25 · 被引 1

Milliwatt ultrasound for navigation in visually degraded environments on palm-sized aerial robots

Manoj Velmurugan, Phillip Brush, Colin Balfour, Richard J. Przybyla, Nitin J. Sanket

无人机 / 空中机器人导航 / SLAM / 自动驾驶机器人学习感知与传感
摘要

Tiny palm-sized aerial robots have exceptional agility and cost-effectiveness in navigating confined and cluttered environments. However, their limited payload capacity directly constrains the sensing suite onboard the robot, thereby limiting critical navigational tasks in Global Positioning System (GPS)–denied wild scenes. Common methods for obstacle avoidance use cameras and light detection and ranging (LIDAR), which become ineffective under visually degraded conditions such as low visibility, dust, fog, or darkness. Other sensors, such as radio detection and ranging (RADAR), have high power consumption, making them unsuitable for tiny aerial robots. Inspired by bats, we propose Saranga, a low-power, ultrasound-based perception stack that localizes obstacles using a dual sonar array. We present two key solutions to combat the low peak signal-to-noise ratio of −4.9 decibels: physical noise reduction and a deep learning–based denoising method. First, we present a practical way to block propeller-induced ultrasound noise on the weak echoes. The second solution is to train a neural network to use the long horizon of ultrasound echoes for finding signal patterns under high amounts of uncorrelated noise where classical methods were insufficient. We generalized to the real world by using a synthetic data generation pipeline augmented with limited real noise data for training. We enabled a palm-sized aerial robot to navigate under visually degraded conditions of dense fog, darkness, and snow in a cluttered environment with thin and transparent obstacles using only onboard sensing and computation. We provide extensive real-world results to demonstrate the efficacy of our approach.

IJRR 2026-05-29

A ground robot dataset for multi-sensor navigation in diverse environments

Ban Li, Jianghui Geng, Pai Wang, Yuanxin Wu, Hui Cheng, Hang Shi

导航 / SLAM / 自动驾驶感知与传感
摘要

Reliable absolute positioning remains a challenge in autonomous ground robotics, particularly in complex and dynamic real-world environments. The fusion of drift-free global positioning and precise local positioning is essential to ensure continuous and accurate localization in mobile ground robots. However, a benchmark dataset encompassing challenging scenarios for both absolute and relative positioning is still lacking, which limits further research and comprehensive evaluation of fusion-based Simultaneous Localization and Mapping (SLAM) methods for autonomous ground robots. To fill this gap, we introduce a ground robot dataset for multi-sensor navigation in diverse environments. All sensors are well calibrated, and Global Navigation Satellite System (GNSS), Inertial Measurement Unit (IMU), camera, and Light Detection and Ranging (LiDAR) measurements are hardware-synchronized. Furthermore, several auxiliary sensors are also included in our system, which are often overlooked in existing datasets but may be vital in certain applications. We perform data acquisition in a variety of challenging environments, both outdoors and indoors. In outdoor scenarios, ground truth is provided by a high-level integrated navigation system, while in indoor environments, it is obtained using a motion capture system. We evaluate the positioning performance of several baseline algorithms on our dataset, and the results show that current methods need further improvement in specific challenging scenarios. To advance relevant research, we make the dataset and associated tools publicly available. The project can be accessed at https://github.com/lizhipro/MSN-DE.git .

RA-L 2026-06-12

Module-Level 3D Motion Perception and Closed-Loop Control of an SMA-Driven Origami Robotic Module for Versatile Robotic Systems

Lei Zhang, Yiming Ouyang, Jingwen Kong, Qiqiang Hu, Shiwu Zhang, Hu Jin

操作与机械臂感知与传感控制与动力学
摘要

Origami robotic modules offer compactness, compliance, and reconfigurability, but their system-level capabilities are often limited by the lack of reliable self-perception and closed-loop 3D motion control at the module level. This work presents a compact and lightweight shape memory alloy (SMA)-driven origami robotic module that achieves self-sensing and closed-loop three-dimensional motion control. The module integrates a foldable origami skeleton with four symmetrically arranged SMA spring actuators for omnidirectional bending and axial contraction. Embedded Hall-effect sensors enable intrinsic three-dimensional motion perception by measuring the module deformation state. A geometric kinematic model is established to map the skeleton deformation to the central-axis ending position of the module for reliable feedback control. Experimental results show mean calibration errors of 0.21 mm (x), 0.15 mm (y), 0.33 mm (z), and 4.78° for orientation estimation. Benefiting from module-level perception and control, multiple modules can be coordinated to perform crawling, confined-space climbing, walking, and object manipulation tasks. These results demonstrate the importance of closed-loop module-level motion control for reconfigurable modular robotic systems.

RA-L 2026-06-12

Sem-NaVAE: Semantically-Guided Outdoor Mapless Navigation via Generative Trajectory Priors

Gonzalo Olguín, Javier Ruiz-del-Solar

导航 / SLAM / 自动驾驶机器人学习感知与传感
摘要

This work presents a mapless navigation approach for outdoor applications. It combines the exploratory capacity of conditional variational autoencoders (CVAEs) to generate trajectories and the semantic segmentation capabilities of a lightweight visual language model (VLM) to select the trajectory to execute. Open-vocabulary segmentation is used to score and select the generated trajectories based on natural language, and a state-of-the-art local planner executes velocity commands. One of the key features of the proposed approach is its ability to generate a large variability of trajectories and select them to navigate in real-time. In real-world outdoor experiments, Sem-NaVAE achieves a 90% success rate across routes of 120–240m in unseen environments, outperforming the nearest baseline by 10% while remaining within 7% of a map-based upper bound. A video showing an experimental run of the system can be found inhttps://youtu.be/i3R5ey5O2yk.

RA-L 2026-06-12

Multi-Solution Inverse Kinematics for Robotic Manipulators via Permutation-Invariant Set Prediction

Duc Tien Nguyen, Van Thanh Tri Nguyen, Truong Do, Vu Linh Nguyen

操作与机械臂机器人学习感知与传感
摘要

Inverse kinematics (IK) for 6-DoF manipulators is inherently set-valued, since multiple distinct joint configurations can realize the same end-effector pose. Nevertheless, learning-based IK is frequently posed as single-output regression, which can induce mode collapse and unstable predictions near branch boundaries. In this work, IK is cast as a deterministic set-prediction problem, in which an unordered set of feasible joint candidates is produced in a single forward pass. A multi-head network is trained with permutation-invariant supervision via optimal bipartite matching, eliminating the need for a fixed global ordering of IK branches. High-fidelity multi-solution supervision on a UR3 platform is obtained through a hybrid analytic–numerical procedure, in which branch seeds are enumerated and refined using Levenberg–Marquardt optimization, followed by feasibility filtering. At deployment, the predicted set is mapped to an executable joint-command sequence through a continuity-aware selection rule, thereby promoting temporally consistent branch choices. On real UR3 trajectories, improved set fidelity and branch coverage are observed relative to single-solution and fixed-assignment baselines, while accurate joint-space and task-space tracking is maintained on held-out motions. Runtime benchmarks further indicate that multi-branch inference is faster than repeated numerical IK when multiple solutions are required.

RA-L 2026-06-12

Breaking Time: A Fully Gaussian Framework for Distributed and Continuous-Time SLAM

Davide Ceriola, Simone Ferrari, Luca Di Giammarino, Leonardo Brizi, Giorgio Grisetti

导航 / SLAM / 自动驾驶感知与传感多机器人 / 集群
摘要

Continuous-time SLAM provides a principled framework for fusing heterogeneous sensors while estimating smooth trajectories, and is particularly well-suited for handling heterogeneous, asynchronous sensor streams with non-uniform readout patterns, such as rolling shutter cameras, LiDAR scanners, radar sweeps, or event-based sensors. In this work, we introduce G-solver, a fully Gaussian and distributed framework that combines Gaussian Belief Propagation (GBP) with Gaussian Process (GP) motion priors for continuous-time trajectory estimation. Our GP model provides a probabilistic representation of the trajectory, enabling consistent interpolation and the use of data-driven hyperparameters, while GBP offers a scalable message-passing formulation well-suited for decentralized settings. The resulting solver naturally extends to multi-camera scenarios without specialized synchronization or engineering effort. We evaluate the approach on synthetic and real data, including rolling shutter and distributed multi-camera optimization, demonstrating accurate and stable estimation with runtimes comparable to existing continuous-time methods. An open-source implementation is released athttps://github.com/rvp-group/gsolver.

RA-L 2026-06-12

Velocity-Form Data-Enabled Predictive Control of Soft Robots under Unknown External Payloads

Huanqing Wang, Kaixiang Zhang, Kyungjoon Lee, Yu Mei, Vaibhav Srivastava, Jun Sheng, et al.

操作与机械臂医疗 / 软体 / 微纳控制与动力学
摘要

Data-driven control methods such as data-enabled predictive control (DeePC) have shown strong potential for efficient control of soft robots without explicit parametric models. However, in object manipulation tasks, unknown external payloads and disturbances can significantly alter the system dynamics and behavior, leading to offset errors and degraded control performance. In this paper, we present a novel velocity-form DeePC framework that achieves robust and optimal control of soft robots under unknown payloads. The proposed framework leverages input/output data in an incremental representation to mitigate performance degradation induced by unknown payloads, eliminating the need for weighted datasets or disturbance estimators. We validate the method experimentally on a planar soft robot and demonstrate its improved performance compared to standard DeePC in scenarios involving unknown payloads.

RA-L 2026-06-12

Semantic-Geometric Task Representations for Bimanual Manipulation from Human Demonstrations to Robot Action Planning

Franziska Herbert, Vignesh Prasad, Han Liu, Dorothea Koert, Georgia Chalvatzaki

操作与机械臂机器人学习感知与传感
摘要

Learning structured task representations from human demonstrations is essential for bimanual manipulation, where action ordering, object involvement, and interaction geometry vary significantly across executions. A key challenge lies in jointly capturing the discrete semantic task structure and the temporal evolution of object-centric geometric relations in a form that supports reasoning over task progression. We introduce a semantic–geometric graph-based task representation that jointly encodes object identities, inter-object semantic relations, and per-object motion histories, via a Message Passing Neural Network (MPNN) encoder and a Transformer-based decoder. The encoder operates solely on the temporal scene graph, producing structured representations decoupled from action labels. The decoder then conditions on action-context to forecast future actions, associated objects, and object motions. This decoupling learns task-agnostic representations, enabling encoder reuse across embodiments through decoder-only finetuning on a small robot dataset. Across eleven bimanual tasks from two datasets, we find that the benefit of structured semantic–geometric representations over simpler sequence-based models grows with task variability in action ordering and object involvement. At deployment, a planner couples the action and motion predictions with learned Probabilistic Movement Primitives, achieving full task success on two real-robot bimanual tasks and outperforming graph ablations, Transformer, decoder-only, and finetuned vision-language model baselines. Website:https://frherbert.github.io/bimanual-task-graphs

RA-L 2026-06-12

SDWM: Learning Bipedal Locomotion via a Smooth Denoising World Model Method

Jie Xue, Zhiyuan Liang, Wencong Gan, Jimeng Xu, Qingdu Li, Fangyan Yang

人形机器人足式 / 四足机器人机器人学习
摘要

Blind locomotion policies trained in simulation for unstructured terrains often suffer from significant performance degradation when deployed on real bipedal robots. A typical manifestation is the emergence of abnormal behaviors (e.g., sudden high-stepping motions) even on simple terrains like flat ground. This issue arises from environmental noise interference and the inability to reliably obtain certain critical state information (such as terrain heightmaps and linear velocities) through proprioception. Existing methods commonly attempt to address this problem by injecting large amounts of noise and learning explicit or implicit representations of the missing information. Nevertheless, excessive noise can severely restrict the potential performance of the policy. The more fundamental challenge lies in the fact that both explicit and implicit representations constructed solely from historical proprioceptive data are unable to accurately reconstruct key unobservable state variables. This inevitable estimation bias ultimately leads to abnormal behaviors. To address these challenges, we propose a bipedal training framework based on a Smoothed and Denoised World Model (SDWM), which explicitly mitigates the effects of observation noise, incomplete state information, and inaccurate reconstruction. We validate the effectiveness of SDWMthrough comparative tests conducted in simulation as well as indoor and outdoor realworld scenarios.

RA-L 2026-06-12

Online Lifelong Dynamic Learning Control for Manipulators With Closed Architecture in Multi-Tasking Environments

Mingyu Wang, Min Wang, Chenguang Yang

操作与机械臂机器人学习控制与动力学
摘要

This paper proposes an online lifelong dynamic learning-based outer-loop velocity compensation control scheme for$n$-degree-of-freedom robotic manipulators operating in continuous multi-task environments. A radial basis function neural network (RBF NN) incorporating a neuron dynamic-growing strategy is employed in the actual controller, enabling real-time adjustment of neuron compact sizes according to NN inputs and facilitating online identification of unknown system dynamics. Furthermore, an online weight feedback mechanism is integrated into the neural network learning law to preserve previously learned weight parameters during task execution. By introducing an S-shaped filtering function, significant synaptic weights are assigned higher feedback gains, whereas less important weights are gradually suppressed toward zero, effectively mitigating catastrophic forgetting. In contrast to conventional dynamic learning control approaches, the proposed scheme enables the retrieval of historical knowledge when online revisiting prior tasks, thereby ensuring sustained control accuracy over time. Rigorous theoretical analysis demonstrates that all closed-loop signals remain uniformly bounded, and both weight estimation errors and system identification errors converge exponentially to a small residual neighborhood around zero. Finally, experiments on a UR5 robotic manipulator validate the effectiveness of the proposed method.

RA-L 2026-06-12

Enhancing Multi-Robot Exploration Using Probabilistic Frontier Prioritization with Dirichlet Process Gaussian Mixtures

John Lewis Devassy, Meysam Basiri, Mário A. T. Figueiredo, Pedro U. Lima

无人机 / 空中机器人导航 / SLAM / 自动驾驶多机器人 / 集群
摘要

Multi-agent autonomous exploration is essential for applications such as environmental monitoring, search and rescue, and industrial-scale surveillance. However, effective coordination under communication constraints remains a significant challenge. Frontier exploration algorithms analyze the boundary between the known and unknown regions to determine the next-best view that maximizes exploratory gain. This article proposes an enhancement to existing frontier-based exploration algorithms by introducing a probabilistic approach to frontier prioritization. By leveraging Dirichlet process Gaussian mixture model (DP-GMM) and a probabilistic formulation of information gain, the method improves the quality of frontier prioritization. The proposed enhancement, integrated into two state-of-the-art multi-agent exploration algorithms, consistently improves performance across environments of varying clutter, communication constraints, and team sizes. Simulations showcase an average exploration time improvement of 10% and 14% for the two algorithms across all combinations. Successful deployment in real-world experiments with a dual-drone system further corroborates these findings.

RA-L 2026-06-12

Wearable Human–Drone Interface: Gesture-Based Control and Vibrotactile Spatial Awareness

Myeong-Ho Shin, Giancarlo Eder Guerra Padilla, Kee-Ho Yu

无人机 / 空中机器人操作与机械臂感知与传感医疗 / 软体 / 微纳
摘要

Manual multirotor piloting requires continuous visual monitoring and dual-stick manipulation, increasing cognitive workload and limiting usability for novice users and attention-constrained scenarios. This paper presents a bidirectional wearable human–drone interface that integrates an inertial measurement unit (IMU)-based gesture controller with an abdominal vibrotactile device to enable non-visual egocentric 3D relative-position awareness. The proposed GRU–ECOC gesture pipeline achieves 97.58% accuracy (macro F1 = 0.9773). A user-in-the-loop waypoint-following study with 12 novice users shows learnability, reducing completion time from 263 s to 186 s over five trials. For spatial feedback, we implement a 3-by-12 vibrotactile device that conveys egocentric 3D position and achieves 82.75% complete 3D recognition accuracy in an indoor study with 12 participants and 1,200 trials. At the integrated level, a real-drone point-to-point task compares Visual-only and Tactile-only operation under the same gesture-control condition. Tactile-only operation remains feasible but slower than the Visual-only baseline, with 8/9 successes and a mean completion time of 45.1 s compared with 9/9 successes and 31.4 s for Visual only operation. A separate sensory-blocked forward-position holding task further demonstrates the feasibility of non-visual closed-loop recovery, with successful recovery in all 8 boundary exit events. These results support the feasibility of bidirectional wearable human–drone interaction while clarifying the cost of tactile-only state feedback relative to visual operation.

Sci. Robotics 2026-05-20

The Moon needs robots

Robin R. Murphy

摘要

Four science fiction works describe realistic construction and mining robots enabling human habitation of the Moon.

T-RO 2026-05-22

Future-Trend-Aware Filter-Based PD-MRAC Method for Quadrotors With Unknown Strong Disturbances

Yanhua Yang, Chenxin Yu, Xiongtao Shi, Changchun Hua, James Lam, Youmin Gong, et al.

无人机 / 空中机器人控制与动力学
摘要

Robust flight in complex and windy environments is critical for both single and multiple quadrotors. Existing methods either learn disturbance model at high computational cost or use error-based adaptive control with a speed-stability trade-off that makes tuning difficult. To address these issues, this paper proposes a future-trend-aware filter-based PD-MRAC (Proportional-Derivative Model Reference Adaptive Control) for single quadrotor and a distributed PD-MRAC for multiple quadrotor formation. By embedding a trend-aware derivative term in the adaptive update laws, the controller obtains anticipatory information about the error evolution, enabling rapid adaptation while mitigating oscillations. For more disturbance-sensitive multi-quadrotors, we design a robust distributed protocol under a directed graph, improving resilience to disturbances. The approach maintains low computational cost and supports fast adaptive updates. Extensive simulations and real-world experiments validate improvement. For single quadrotor, RMSE reduced by around 57% versus the baselines and by around 12% versus the DJI Mavic 2. For multi-quadrotors, formation results show enhanced robustness in simulation and effective real-world indoor/outdoor experiments under strong winds. Our project page is at https://xiongtao-shi.github.io/PD-MRAC/ .

T-RO 2026-05-22

Simplifying Robotic Ultrasound Calibration via Conic Sections Geometry

Zixing Jiang, Yingbai Hu, Yichong Sun, Zheng Li

导航 / SLAM / 自动驾驶医疗 / 软体 / 微纳
摘要

Robotic ultrasound (US) systems represent an emerging frontier in medical imaging. A fundamental component of these systems is the rigid body transformation between the robot flange and the attached US probe, which enables mapping of visual data from image space to the robot's reference frame. Traditionally, calibrating this transformation has been a tedious process, complicated by equipment demands and operational constraints arising from the probe's narrow field of view. This work presents a novel calibration strategy based on conic sections geometry, which offers several key simplifications over existing approaches: 1). It requires no external equipment beyond a single cone phantom; 2). It operates with a small input size and imposes no strict alignment or motion constraints on the US scan plane during calibration; and 3). It employs a straightforward pattern analysis pipeline to process images acquired from phantom scans. Experimental validation results show that the proposed method achieves accuracy comparable to existing state-of-the-art approaches while delivering superior precision, thereby demonstrating enhanced calibration reproducibility enabled by its streamlined workflow. These advantages make this method particularly suitable for application in clinical scenarios that require frequent and efficient calibration.

IJRR 2026-05-19

Bioinspired multisegment knee exoskeletons with variable stiffness and kinematic compatibility

Ming Xu, Zhihao Zhou, Wenjie Lou, Xiaolin Dai, Run Wang, Sunil K. Agrawal, et al.

足式 / 四足机器人医疗 / 软体 / 微纳人机交互 / 遥操作
摘要

The knee joint plays a critical role in locomotion but is susceptible to overuse injuries, motivating the development of assistive exoskeletons. Current designs face a fundamental trade-off between achieving kinematic compatibility with the knee’s complex polycentric motion and providing effective variable-stiffness functionality for biomechanical support. This study presents a novel cable-driven multisegment exoskeleton to reconcile these competing requirements through an integrated biomimetic design. The proposed system employs redundant rotational joints and a linear guide rail to passively accommodate natural joint kinematics while enabling wide-range stiffness regulation (0–207 Nm/rad) via active cable length adjustment. This single-actuator approach achieves dynamic stiffness regulation, deterministic torque transmission with an effective moment arm exceeding 70 mm, and seamless state modulation within a low-profile structure (0.63 kg). Benchtop characterization confirmed precise stiffness control across the operational range (rmse ≤ 0.035 Nm/rad). Human subject experiments revealed significant muscular effort reduction during demanding tasks without compromising natural joint kinematics, including 23.9% decrease in peak vastus lateralis activation during incline walking and 29.2% reduction during squatting compared to unassisted conditions. These results validate the exoskeleton’s ability to reconcile anatomical compatibility with physiologically relevant stiffness regulation, representing a significant advance in knee assistive technology with broad applications in clinical rehabilitation and physical performance augmentation. This study bridges a critical gap in knee exoskeleton development, offering a unified solution for comfortable and effective assistance across dynamic tasks.

Sci. Robotics 2026-05-13

Cross-link collective: Entangled robotic matter with cohesive motion

Danna Ma, Baxi Chong, Daniel I. Goldman, Kirstin H. Petersen

多机器人 / 集群
摘要

Robotic applications increasingly demand systems that are resilient, adaptable, and scalable. One promising route is through collectives of simple modules, where complex group-level behavior emerges from local interactions. By omitting fixed topologies and tight coordination, this approach sacrifices predictability and conventional tools for behaviors inherently optimized through stochastic mechanical interactions. A key challenge is maintaining cohesion and functionality without fixed connections and explicit coordination. We introduce the cross-link collective, a physically entangled robotic system inspired by cross-linking in active gels. Through shape morphing and transient entanglement, individually immobile modules produce sustained collective motion. The mechanically intelligent robot matter favors chains and phase relationships that reduce joint torques and reconfigures in response to perturbations. We show that distributed control can be added to this substrate to further enhance cohesion. Leveraging weak, reversible connections, the cross-link collective is adaptable, scalable, and fault tolerant, offering insights to applications from soft matter and robotics.

RA-L 2026-06-15

vS-Graphs: Tightly Coupling Visual SLAM and 3D Scene Graphs Exploiting Hierarchical Scene Understanding

Ali Tourani, Saad Ejaz, Hriday Bavle, Miguel Fernandez-Cortizas, David Morilla-Cabello, Jose Luis Sanchez-Lopez, et al.

导航 / SLAM / 自动驾驶感知与传感
摘要

Current Visual Simultaneous Localization and Mapping (VSLAM) systems often struggle to create maps that are both semantically rich and easy to interpret. While incorporating semantic scene knowledge helps build richer maps with contextual associations among mapped objects, representing them in structured formats such as scene graphs has not been widely addressed, leading to complex map comprehension and limited scalability. This paper introduces vS-Graphs, a novel real-time VSLAM framework that integrates vision-based scene understanding with map reconstruction and comprehensible graphbased representation. The framework infers structural elements (i.e., rooms and floors) from detected building components (i.e., walls and ground surfaces) and incorporates them into optimizable 3D scene graphs. This solution enhances the reconstructed map's semantic richness, comprehensibility, and localization accuracy. Extensive experiments on standard benchmarks and real-world datasets demonstrate that vS-Graphs achieves an average of 15.22% accuracy gain across all tested datasets compared to state-of-the-art VSLAM methods. Furthermore, the proposed framework achieves environment-driven semantic entity detection accuracy comparable to that of precise LiDARbased frameworks, using only visual features. The code is publicly available athttps://github.com/snt-arg/visual sgraphs and is actively being improved.

RA-L 2026-06-15

Efficient Data-driven Reference Governor Design for Safe Evasive Manoeuvring

Petar Velchev, Alberto Bertipaglia, Felipe Santafe, Mohammad Khosravi, Barys Shyrokau

感知与传感控制与动力学
摘要

This paper presents a novel data-driven Reference Governor with Model Predictive Control, integrating local motion replanning and path following for collision avoidance. Employing a model-free Reference Governor, the proposed framework utilises system knowledge through Bayesian Optimisation to augment predetermined evasive trajectories, minimising pathfollowing errors and simultaneously ensuring obstacle safety margins. A single-track vehicle model in combination with a nonlinear tyre model is used to capture the vehicle's dynamics. The optimised control action is the vehicle steering angle, whilst the Reference Governor optimises parameters of a sigmoid reference signal to minimise the tracking error and guarantee safety with respect to obstacles in emergency manoeuvres. The proposed approach is evaluated on a single lane change using a highfidelity simulation environment, and its performance is compared to a baseline controller integrating path following and obstacle avoidance. The results demonstrate a 14% reduction in safety critical overshoot, maximising obstacle safety distance and a four times lower controller cycle time compared to the baseline. Furthermore, through a robustness analysis, it is demonstrated that the proposed approach is more robust towards model mismatches and perception-based errors, as seen by average 30% and 40% reductions in near-miss and collision rates.

RA-L 2026-06-15

Robust Data-Driven Path Tracking Control for Autonomous Vehicles: A Koopman Operator Approach With High-Order Super-Twisting Observer

Shaobo Liang, Shuguo Pan, ZongLiang Chen, Wang Gao, Xianlu Tao

导航 / SLAM / 自动驾驶控制与动力学
摘要

Path tracking control is a critical component of autonomous driving technology. However, it faces significant challenges due to the inherent strong nonlinearity and dynamic uncertainties of autonomous vehicles (AVs), which remains a severe challenge in the field. To address the problem, this paper proposes a novel robust data-driven control framework designed to balance modeling accuracy with disturbance rejection performance. First, a control-oriented Deep Koopman model is constructed to map the nonlinear AV path-tracking error dynamics into a high-dimensional lifted space. Second, to address inevitable modeling residuals and external disturbances, a high-order super-twisting observer (HOSTO) is designed within the lifted space, which guarantees finite-time convergence of disturbance estimation, providing strong robust compensation. Finally, an approximate lifted-space MPC with Jacobian-based cost mapping is formulated as a standard Quadratic Program (QP), avoiding repeated nonlinear decoding during online optimization. The effectiveness of the proposed method is validated through comprehensive simulations and experiments, showing superior control accuracy and robustness against disturbances compared to existing algorithms.

RA-L 2026-06-15

Scaling Rough Terrain Locomotion With Automatic Curriculum Reinforcement Learning

Ziming Li, Chenhao Li, Marco Hutter

足式 / 四足机器人机器人学习
摘要

Curriculum learning has demonstrated substantial effectiveness in robot learning. However, it still faces limitations when scaling to complex, wide-ranging task spaces. Such task spaces often lack a well-defined difficulty structure, making the difficulty ordering required by previous methods challenging to define. We propose a Learning Progress-based Automatic Curriculum Reinforcement Learning (LP-ACRL) framework, which estimates the agent's learning progress online and adaptively adjusts the task-sampling distribution, thereby enabling automatic curriculum generation without prior knowledge of the difficulty distribution over the task space. Policies trained with LP-ACRL enable the ANYmal D quadruped to achieve and maintain stable, high-speed locomotion at 2.5 m/s linear velocity and 3.0,/s angular velocity across diverse terrains, including stairs, slopes, gravel, and low-friction flat surfaces–whereas previous methods have generally been limited to high speeds on flat terrain or low speeds on complex terrain. Experimental results demonstrate that LP-ACRL exhibits strong scalability and real-world applicability, providing a robust baseline for future research on curriculum generation in complex, wide-ranging robotic learning task spaces.

RA-L 2026-06-15

Inverse Kinematics of Continuum Robots: A Scale-Aware Empirical Analysis of Finite-Sample Set-Valued Structure

Achille Melingui, Joseph Jean-Baptiste Mvogo Ahanda, Elizabeth Von-Kiti, Rochdi Merzouki

机器人学习医疗 / 软体 / 微纳
摘要

Inverse kinematics (IK) of continuum robots is inherently set-valued: a single task-space target corresponds to a family of feasible configurations. However, in practice, this structure is only observable through finite samples and tolerance-based approximations, making it dependent on sampling density, tolerance, and estimator design. In this work, IK is studied as a conditional solution-set problem by approximating inverse neighborhoods from dense configuration sampling and analyzing their structure using graph-based connectivity and intrinsic dimension (ID) estimators. Experiments on constant-curvature robots with increasing redundancy ($n=3,6,9$), under both position-only and full-pose ($SE(3)$) formulations, show that inverse neighborhoods exhibit stable low-dimensional structure despite increasing actuation dimension. In contrast, connectivity estimates are highly scale-dependent: fixed-radius graphs produce strong fragmentation, whereas robust graph constructions reveal neighborhoods dominated by a single connected component. Increasing the sampling density further shows that the intrinsic dimension and the dominant-component structure remain stable, while apparent separation decreases. To assess functional implications, a trajectory-level path-lifting experiment is introduced and validated on CBHA data, demonstrating continuous inverse solutions with low tracking error and no branch switching. These results suggest that the observed fragmentation is largely a finite-sample and estimator-dependent effect. This motivates modeling IK as a conditional distribution$p(q \mid y)$rather than a deterministic function, providing empirical support for generative approaches such as conditional normalizing flows.

IJRR 2026-05-30

Designing standard library of manipulation skill-agents for Learning-from-Observation

Jun Takamatsu, Daichi Saito, Katsushi Ikeuchi, Atsushi Kanehira, Kazuhiro Sasabuchi, Naoki Wake

操作与机械臂
摘要

To use new robot hardware, it is necessary to develop a control program tailored to the specific robot. Considering the reusability of software among robots is crucial for minimizing the effort involved in this process and maximizing software reuse across different robots. This paper proposes a method to reduce the effort of software development for a specific robot by considering hardware-level reusability, using a Learning-from-Observation (LfO) framework with a pre-designed skill library. The LfO framework first represents the demonstrated actions in hardware-independent representations, referred to as task models, from observing human demonstrations, and captures the necessary parameters for the interaction between the environment and the robot (Ikeuchi et al., 2024). Then, for executing the demonstrated actions, a set of skill agents is employed to convert the representations of the task models into robot commands. This paper focuses on the latter part of the LfO framework and explores a hardware-independent design approach for these skill agents. Especially, we dedicate a manipulation skill library. To achieve this, we describe these skill agents in a hardware-independent manner based on the physical characteristics of a grasped object and an environment. This paper, first, defines a necessary and sufficient skill-agent set corresponding to covering all possible actions, and considers the design principles for these skill agents. We provide concrete examples of such skill agents and demonstrate the practicality of these skill agents by showing that the same representations can be executed on two different robots, Nextage and Fetch, and two different end-effectors, Shadow Hand-Lite and parallel gripper.

T-RO 2026-05-26

Fast and Scalable Game-Theoretic Trajectory Planning with Intentional Uncertainties

Zhenmin Huang, Yusen Xie, Benshan Ma, Shaojie Shen, Jun Ma

多机器人 / 集群
摘要

Trajectory planning involving multi-agent interactions has been a long-standing challenge in the field of robotics, primarily burdened by the inherent yet intricate interactions among agents. While game-theoretic methods are widely acknowledged for their effectiveness in managing multi-agent interactions, significant impediments persist when it comes to accommodating the intentional uncertainties of agents. In the context of intentional uncertainties, the heavy computational burdens associated with existing game-theoretic methods are induced, leading to inefficiencies and poor scalability. In this paper, we propose a novel game-theoretic interactive trajectory planning method to effectively address the intentional uncertainties of agents, and it demonstrates both high efficiency and enhanced scalability. As the underpinning basis, we model the interactions between agents under intentional uncertainties as a static Bayesian game, and we show that its agent-form equivalence can be represented as a potential game under certain assumptions. The existence and attainability of the optimal interactive trajectories are illustrated, as the corresponding static Bayesian Nash equilibrium can be attained by optimizing a unified optimization problem. Additionally, we present a distributed algorithm based on the dual consensus alternating direction method of multipliers (ADMM) tailored to the parallel solving of the problem, thereby significantly improving the scalability. The attendant outcomes from simulations and experiments demonstrate that the proposed method is effective across a range of scenarios characterized by general forms of intentional uncertainties. Its scalability surpasses that of existing centralized and decentralized baselines, allowing for real-time interactive trajectory planning in uncertain game settings. The source code will be available on https://github.com/zhuangdf/Potential-Bayesian-Game-release .

T-RO 2026-05-26

6D Tip Wrench Estimation for Continuum Robots: A Koopman-UKF-Wrench Decomposition Approach

Lingyun Zeng, S.M.Hadi Sadati, Lukas Lindenroth, Christos Bergeles

医疗 / 软体 / 微纳
摘要

This paper presents a method for comprehensive 6D estimation of tip wrench for generally deflected static elastic rods. Current methods for load estimation are restricted to estimating lateral (point or distributed) forces for (quasi-)planar deformation; estimation of tangential force and moment, i.e., the full 6D wrench, remains largely unreliable due to the ill-posed nature of the problem. To address this challenge, this paper begins by proposing a high-fidelity static rod model that leverages Koopman Operator theory. Building on this model and utilizing shape feedback, a computationally efficient three-step wrench estimator is proposed: (i) a Koopman-UKF local moment observer, (ii) a static equilibrium solver, and (iii) a rod model propagator. Then, a 2D wrench screw system, identified as the insensible wrench in the initial estimation, elucidates error sources and informs strategies to enhance accuracy by incorporating additional feedback, such as the rod tip material frame and base axial force. Ultimately, the framework delivers accurate 6D tip wrench estimation with quantified uncertainty. Simulation and experimental evaluations validate its effectiveness, demonstrating mean errors of $53.14\,$ mN (1.94%) and $2.65\,$ mNm (7.18%) for a $159\,$ mm-long Nitinol tube undergoing complex out-of-plane deformations, outperforming three replicated state-of-the-art methods. Additionally, its applicability to more complex continuum robots is demonstrated through load estimation on a Parallel Continuum Robot.

RA-L 2026-06-08

Strategizing at Speed: A Learned Model Predictive Game for Multi-Agent Drone Racing

Andrei-Carlo Papuc, Lasse Peters, Sihao Sun, Laura Ferranti, Javier Alonso-Mora

无人机 / 空中机器人导航 / SLAM / 自动驾驶多机器人 / 集群控制与动力学
摘要

Autonomous drone racing pushes the boundaries of high-speed motion planning and multi-agent strategic decision-making. Success in this domain requires drones not only to navigate at their limits but also to anticipate and counteract competitors' actions. In this paper, we study a fundamental question that arises in this domain: how deeply should an agent strategize before taking an action? To this end, we compare two planning paradigms: the Model Predictive Game (MPG), which finds interaction-aware strategies at the expense of longer computation times, and contouring Model Predictive Control (MPC), which computes strategies rapidly but does not reason about interactions. We perform extensive experiments to study this trade-off, revealing that MPG outperforms MPC at moderate velocities but loses its advantage at higher speeds due to latency. To address this shortcoming, we propose a Learned Model Predictive Game (LMPG) approach that amortizes model predictive gameplay to reduce latency. In both simulation and hardware experiments, we benchmark our approach against MPG and MPC in head-to-head races, finding that LMPG outperforms both baselines.

RA-L 2026-06-08

Prescribed Performance-Based Data-Driven Adaptive Admittance Control for Physical Human-Robot Interaction

Zi-Yuan Dong, Xinyi Yu, Mingqing Lin, Linlin Ou

机器人学习人机交互 / 遥操作控制与动力学
摘要

It has been a critical challenge to provide reliable performance constraints while ensuring safe stability for robotic systems, particularly in relatively complex human-robot interaction tasks. To overcome the limitation in traditional controllers where control accuracy is highly dependent on the dynamic model, a novel neural network-based data-driven sliding mode controller (NNDSMC) is proposed. It optimizes control strategies in real time by constructing a dynamic data model while considering interaction forces. On this basis, a data-driven prescribed performance control (DPPMC) method is proposed based on quadratic programming (QP), where faster transient response and greater steady-state tracking accuracy are guaranteed with the designed data-driven high-order control barrier function (DHCBF). To further realize compliant human-robot interaction, a novel data-driven adaptive admittance controller is developed, which combines the flexibility of model predictive control with the fast convergence of DPPMC. The whole design process does not involve any model information. The feasibility of the proposed method is validated through the Franka-Panda robot with seven degrees of freedom.

RA-L 2026-06-08

Collective Intelligent Reinforcement Learning for Biomimetic Clapping Trajectory Optimization

Zhenyao Zhao, He Shen, Ni Li, Yixin Yang

足式 / 四足机器人导航 / SLAM / 自动驾驶机器人学习多机器人 / 集群控制与动力学
摘要

The foreflippers of sea lions enable highly efficient and agile locomotion through a unique clapping propulsion mechanism. Nonetheless, identifying the optimal kinematics for these movements constitutes a significant challenge, owing to the highly nonlinear and dynamic fluid-structure interactions. To address this, we present a collective intelligent reinforcement learning method for generating clapping trajectories of a bionic foreflipper. The approach begins by modeling biological motions using periodic triaxial motions based on closed Bézier curves, enabling the representation of essential clapping dynamics with a small set of parameters. Subsequently, we propose a collective intelligent reinforcement learning algorithm to efficiently optimize the parameterized clapping trajectory. A comparative evaluation against conventional optimization algorithms reveals accelerated convergence and superior performance. The optimal trajectories are validated against biological motion data, confirming their physiological plausibility. Simulation results show that a significant 75% improvement in thrust impulse and a 70% reduction in convergence time compared to the baseline reinforcement learning method.

RA-L 2026-06-08

E${^{2}}$Depth: E fficient Self-Supervised Surround-View Depth Estimation With E xplicit Geometric Enhancement

Sheng Zhang, Juan Li, Chang Liu, Chang Liu, Jie Li, Dongxiao Yang

导航 / SLAM / 自动驾驶机器人学习感知与传感
摘要

Surround-view depth estimation is critical for robotic perception and autonomous driving. Existing methods rely on implicit cross-view feature learning, which incurs a high computational cost. In this paper, we propose E${^{2}}$Depth, an efficient self-supervised surround-view depth estimation framework that explicitly leverages geometric and signal priors at each pipeline stage. First, hierarchical volumetric fusion is performed to align multi-level features across cameras with controlled memory consumption. Second, wavelet-domain edge enhancement is introduced to recover sharper depth boundaries without external supervision. Finally, an explicit pose estimation network guided by noisy motion priors is designed to stabilize training and improve depth scale reliability. Extensive experiments on the DDAD and nuScenes benchmarks demonstrate that E${^{2}}$Depth achieves a favorable trade-off between accuracy and efficiency, supporting practical deployment on real-world platforms.

RA-L 2026-06-08

Robust Rigid Body Pose Estimation for Sparse Point Sets Under Heteroscedastic Noise

Hao Wang, Taogang Hou, Tianmiao Wang, Buhui Jiang

无人机 / 空中机器人导航 / SLAM / 自动驾驶感知与传感
摘要

Rigid body pose estimation from sparse point correspondences is important in robotics but remains challenging when measurements are affected by heteroscedastic noise and occasional outliers. This issue is particularly evident in marker-based tracking of fast-moving platforms, where localization uncertainty may vary substantially across markers due to motion blur, occlusion, and sensing artifacts. To address this setting, we develop a robust registration framework tailored to sparse and structured point sets by modeling residuals with a heavy-tailed Student'st-distribution. Using its Gaussian scale mixture representation, the estimation problem is formulated as a maximum-likelihood objective with latent correspondence reliabilities, and solved via an iteratively reweighted least-squares procedure with a graduated non-convexity schedule. The resulting method adaptively down-weights unreliable measurements while retaining the efficiency of weighted rigid alignment. Simulation results across multiple noise models and outlier ratios show that the proposed approach achieves consistently low rotation and translation errors in sparse regimes. Real-world experiments on high-speed quadrotor tracking further demonstrate improved geometric consistency and reduced trajectory jerk relative to standard least-squares estimation.

RA-L 2026-06-08

A Unified Multi-Dynamics Framework for Perception-Oriented Modeling in Tendon-Driven Continuum Robots

Ibrahim Alsarraj, Yuhao Wang, Abdalla Swikir, Cesare Stefanini, Dezhen Song, Zhanchi Wang, et al.

操作与机械臂感知与传感医疗 / 软体 / 微纳控制与动力学
摘要

Tendon-driven continuum robots offer intrinsically safe and contact-rich interactions owing to their kinematic redundancy and structural compliance. However, their perception often depends on external sensors, which increase hardware complexity and limit scalability. This work introduces a unified deterministic multi-dynamics modeling framework for tendon-driven continuum robotic systems, exemplified by a spiral-inspired robot named Spirob. The framework integrates motor electrical dynamics, motor–winch dynamics, and continuum robot dynamics into a coherent system model. Within this framework, motor signals such as current and angular displacement are modeled to expose the electromechanical signatures of external interactions, enabling perception grounded in intrinsic dynamics. The model captures and validates key physical behaviors of the real system, including actuation hysteresis and self-contact at motion limits. Building on this foundation, the framework is applied to environmental interaction: first for passive contact detection, verified experimentally against simulation data; then for active contact sensing, where control and perception strategies from simulation are successfully applied to the real robot; and finally for object size estimation, where a policy learned in simulation is directly deployed on hardware. The results demonstrate that the proposed framework provides a physically grounded way to interpret interaction signatures from intrinsic motor signals in tendon-driven continuum robots.

RA-L 2026-06-08

Structure-Aware Diffusion Policy for Bimanual Cooperative Manipulation

Xiao Li, Jie Zhang, Lequn Fu, Leyuan Gu, Youjun Xiong, Shiqi Li

操作与机械臂机器人学习感知与传感多机器人 / 集群控制与动力学
摘要

Bimanual cooperative manipulation involves critical closed-chain contact phases in which task success depends on maintaining inter-arm geometric consistency while mitigating antagonistic interaction. We present a structure-aware diffusion policy that conditions action generation on an explicit cooperative representation, rather than inferring coordination implicitly from raw multimodal inputs. The representation combines (i) Cooperative Dual Task-Space (CDTS) descriptors computed from forward kinematics, which decompose bimanual geometry into absolute motion and relative configuration, and (ii) a gravity-compensated force-difference cue computed from left–right end-effector forces to capture interaction inconsistency. To integrate perception with these structured priors, we design a hierarchical fusion module in which bidirectional vision–force cross-attention first aligns cross-modal evidence, followed by CDTS-guided cross-attention that uses structural tokens as queries to extract coordination-relevant features. Real-robot experiments on dual-arm cooperative carrying across objects of varying sizes and appearances demonstrate improved closed-chain stability and more robust performance across the evaluated object variations, increasing the overall success rate from 58% to 89% compared with a vision-dominant diffusion baseline.

RA-L 2026-06-08

Data-Driven Compensation Based on the Kinematic Model of a Tendon-Driven Continuum Robot for Ankle Motion Reproduction

Junyu Zhang, Xuanquan Wang, Jian Li, Zhen Chen, Zhen Li, Jingyu Pan

操作与机械臂导航 / SLAM / 自动驾驶医疗 / 软体 / 微纳
摘要

In this letter, we propose a biomimetic ankle prototype based on a tendon-driven continuum robot for reproducing ankle motions associated with the rotating–pulling–poking manipulation, providing an experimental platform for related motion analysis and research. To address the non-ideal transmission problem in ankle motion reproduction, we develop a data driven compensation framework based on the kinematic model, in which a mapping from driving inputs to end-effector pose is established using Cosserat rod theory and the geometrically variable strain model to support the compensation framework. Simulation and prototype experiments show that the proposed framework can reproduce representative ankle configurations and motion trajectories on the platform. After compensation, the angle error is reduced to within 0.1°, and the end-position error does not exceed 10% of the backbone length, verifying the effectiveness of the proposed framework.

RA-L 2026-06-08

FastStair: Learning to Run up Stairs With Humanoid Robots

Yan Liu, Tao Yu, Haolin Song, Hongbo Zhu, Nianzong Hu, Yuzhi Hao, et al.

人形机器人足式 / 四足机器人导航 / SLAM / 自动驾驶机器人学习控制与动力学
摘要

Running up stairs is effortless for humans but remains extremely challenging for humanoid robots due to the simultaneous requirements of high agility and strict stability. Model-free reinforcement learning (RL) can generate dynamic locomotion, yet implicit stability rewards and heavy reliance on task-specific reward shaping tend to result in unsafe behaviors, especially on stairs; conversely, model-based foothold planners encode contact feasibility and stability structure, but enforcing their hard constraints often induces conservative motion that limits speed. We present FastStair, a planner-guided, multi-stage learning framework that reconciles these complementary strengths to achieve fast and stable stair ascent. FastStair integrates a parallel model-based foothold planner into the RL training loop to bias exploration toward dynamically feasible contacts and to pretrain a safety-focused base policy. To mitigate planner-induced conservatism and the discrepancy between low- and high-speed action distributions, the base policy was fine-tuned into speed-specialized experts. These experts are retained as separate rule-switched branches, each equipped with independent LoRA layers that are co-fine-tuned to smooth the switching transition, yielding a single deployable controller that operates reliably across the full commanded-speed range. We deploy the resulting controller on the Oli humanoid robot, achieving stable stair ascent at commanded speeds up to 1.65 m/s and traversing a 33-step spiral staircase (17 cm rise per step) in 12 s, demonstrating robust high-speed performance on long staircases. Project Page:https://npcliu.github.io/FastStair.

RA-L 2026-06-08

PGDP: Physics-Guided Diffusion Policy for Multi-Rate Control in Contact-Rich Manipulation

Deliang Zhu, Yue Li, Yang Li, Shijie Guo

操作与机械臂机器人学习感知与传感
摘要

In contact-rich manipulation, coarse-timescale action generation alone is often insufficient to maintain stable interaction when contact conditions change rapidly during execution. As a result, learned policies may struggle to suppress force spikes, incipient slip, and object or tissue deformation online, limiting robustness in high-contact tasks. To address this issue, we present Physics-Guided Diffusion Policy (PGDP), a visuo-tactile manipulation framework that combines diffusion-based motion intent generation with high-frequency tactile-constrained refinement. We develop a customizable three-dimensional curved-surface tactile sensor and a Tactile Manifold Prior Encoder (TMPE) to capture temporally structured tactile representations and provide physically informative visuo-tactile conditions for policy learning. At the motion-generation level, a low-frequency Conditional Diffusion Model (CDM) predicts short-horizon Cartesian action intents from visual and tactile context. At execution time, a high-frequency Physics-Guided Module (PGM) solves a constrained quadratic program in the joint space to track the diffusion intent while enforcing tactile-driven safety and contact-consistency requirements. Experiments on block-uprighting and massage tasks show that PGDP increases the unitless composite score from 0.80 to 0.87 and from 0.77 to 0.89, respectively, compared with the strongest two-rate decoder baseline.

RA-L 2026-06-08

Learning Context-Aware Neural ODE Dynamics for Adaptive Robotic Control

Shao-Yi Yu, Jen-Wei Wang, Maya Horii, Masayoshi Tomizuka, Vikas Garg

无人机 / 空中机器人操作与机械臂控制与动力学
摘要

Robotic systems deployed in uncertain and dynamically changing environments often face variations in contact conditions, aerodynamic effects, and external disturbances that challenge reliable control. To remain effective under modelbased control, these systems require dynamics models that can adapt to such changes, especially when direct access to complete environmental information is limited. To enable adaptability and facilitate integration with model predictive control, we propose a context-aware dynamics model based on neural ordinary differential equations, which infers environmental factors from state-action histories using a two-phase training procedure. We validate the approach across diverse robotic platforms, including a quadrotor in simulation, as well as a Sphero BOLT robot and a Fanuc manipulator in real-world experiments. The results demonstrate that our method effectively adapts to temporally and spatially varying environmental changes across different tasks. Videos are available here, and the source code is available at our GitHub repository.

RA-L 2026-06-08

Robust Fall Recovery for Armless Bipedal-Wheeled Robots Via Force-Guided Learning

Haidong Hou, Zhangguo Yu, Tao Han, Hengbo Qi, Khaleel Ghazal, Yu Zhang, et al.

人形机器人足式 / 四足机器人导航 / SLAM / 自动驾驶机器人学习控制与动力学
摘要

Fall recovery is critical for autonomous legged locomotion. Existing methods have demonstrated that some legged robots, such as humanoids and quadrupeds, are capable of fall recovery from diverse postures by utilizing arms or coordinating multi-legs to generate support forces. Without arms or other legs to provide supportive assistance, a bipedal-wheeled robot must rely solely on the actuation of its legs, making recovery particularly difficult. To address this, we introduce FTSR (Force-guided Teacher-student framework with Stage- wise Rewards). The force-guided method constructs an external auxiliary force during simulation training that correlates directly with the robot's real-time height, explicitly formulating this force as an optimizable constraint. Through constrained reinforcement learning, the policy is guided toward reducing force dependency gradually and increasing the body height, developing internal recovery strategies despite having no arms for support. Height-progressive stage-Wise rewards progressively structure posture stabilization during recovery and transition to sustained locomotion, integrated with teacher-student architecture distilling privileged knowledge of force effects and recovery dynamics. After simulation training, the policy is deployed on a physical armless bipedal-wheeled robot and extensively evaluated. Experiments confirm robust and reliable fall recovery under diverse challenging conditions, demonstrating strong environmental adaptability and motion robustness, while maintaining full post-recovery motion capability. The framework also generalizes effectively to a high-DOF humanoid, confirming its practical generalizability. The project page is available athttps://2350575870.github.io/force-guided.github.io/.

Sci. Robotics 2026-04-29

A retrieval-augmented framework enabling VLM spatial awareness for object-centric robot manipulation

Kai Chen, Chengkun Li, Chang Tu, Jiahui Pan, Yiyao Ma, Wei Chen, et al.

操作与机械臂机器人学习感知与传感
摘要

Connecting the semantic reasoning of vision-language models (VLMs) to the precise geometric demands of robotic manipulation remains a fundamental challenge. Although VLMs can interpret high-level commands, they lack the intrinsic spatial intelligence required for tasks demanding precise object placement, orientation, and physical reasoning. Here, we introduce Retrieval-Augmented Manipulation (RAM), an object-centric framework that endows general-purpose vision foundation models with the spatial reasoning necessary for robust manipulation. RAM bridges the semantic-to-geometric gap by grounding abstract concepts into an explicit, object-centric three-dimensional (3D) representation. This grounded information is then provided as augmented context to the VLM, empowering it to decompose complex instructions into a sequence of spatially precise and physically plausible subgoals. We demonstrate that RAM, in a zero-shot setting on a real-world robot, can execute these subgoals to fulfill complex spatial language instructions, complete spatially aware manipulation under the guidance of a single 2D image, and adaptively replan tasks by reasoning about physical constraints like object size and collisions. Quantitative evaluations on the Common Object in 3D (CO3D) dataset also validated that RAM’s core vision module generalizes to previously unseen object categories and is robust to variations in shape and occlusions. By providing a structured bridge between semantic intent and geometric execution, RAM represents a critical step toward developing more physically intelligent and general-purpose robotic systems.

RA-L 2026-06-12

IAOM: An Intention-Aware Optimization Model With GNN-Based Policy Learning for Dynamic Multi-Robot Cooperative Task Allocation

Qingyang Long, Liping Liang

机器人学习多机器人 / 集群
摘要

This paper studies dynamic multi-robot cooperative task allocation in multi-stage scout-rescue missions with unknown and evolving task demands. We propose IAOM, an Intention-Aware Optimization Model that encodes each robot's decision state–target task, role, and planner-derived ETA–as an intention vector and organizes robots and tasks into an execution-aware intention graph capturing residual demand and active commitments. A GAT-PPO policy is learned on this graph and coupled to a hybrid A* based motion layer, so that allocation, coalition formation, and congestion handling are optimized within a single framework. We evaluate IAOM on grid based urban rescue benchmarks with multiple map sizes and resource density regimes against optimization-, heuristic-, and learning-based baselines. Under abundant and normal resources, IAOM consistently reduces mission makespan and average completion time relative to the best baseline, achieving up to about$20\%$shorter makespan and about$50\%$lower completion time with comparable load balance. Under tight resources where robots barely meet total demand, gains are smaller and all methods are largely bounded by resource scarcity, but IAOM remains competitive.

RA-L 2026-06-12

Correction to “Decision-Making for Autonomous Driving via a Coupled Reinforcement Learning Network Combined With Risk Assessment”

Chuan Hu, Yixun Niu, Hao Jiang, Xi Zhang, Xin Cheng

导航 / SLAM / 自动驾驶机器人学习
摘要

In [1], the corresponding author information was incorrectly published. Yixun Niu was incorrectly listed as the corresponding author. The correct corresponding author should be Xin Cheng. This correction applies only to the corresponding author information; all other contents of the article remain unchanged.

RA-L 2026-06-12

Real-to-Sim-to-Real: Learning Agile and Robust Recovery Skills with Terrain Imagination via Adversarial Imitation

Tangyu Qian, Huayang Yin, Zhen Kan

足式 / 四足机器人机器人学习
摘要

Legged animals can self-right and stand up from arbitrary postures across diverse environments. Empowering robots with animal-like recovery capabilities expands their real-world applications. However, existing recovery controllers that rely on predefined trajectories are limited to flat terrain, whereas current learning-based policies often produce overly aggressive or sluggish control outputs, leading to poor hardware performance. To bridge the gap, this paper proposes a novel real-to-sim-to-real learning framework that enables quadruped robots to acquire both agile and robust recovery skills. Unlike previous learning approaches that train from scratch, we employ adversarial imitation learning with hardware-collected demonstrations to promote natural motions and enhance learning efficiency. For robust recovery on complex terrains, a terrain imagination module is integrated to predict key terrain properties via onboard sensing. To ensure reliable sim-to-real transfer, the recovery policy is trained with domain randomization, terrain curriculum learning, and geometry decomposition techniques. The trained policy can be generalized to various quadruped robots and is directly deployed on the Unitree Go2 Hardware. To the best of our knowledge, this is the first time that a small-sized quadruped demonstrates such advanced recovery skills across diverse environments, outperforming prior studies with more powerful hardware. The project page is available at:https://rsr-recovery.github.io/.

RA-L 2026-06-12

PROBE: Probabilistic Occupancy BEV Encoding with Analytical Translation Robustness for 3D Place Recognition

Jinseop Lee, Byoungho Lee, Gichul Yoo

导航 / SLAM / 自动驾驶感知与传感
摘要

We presentPROBE(PRobabilisticOccupancyBEVEncoding), a learning-free LiDAR place recognition descriptor that models each BEV cell's occupancy as a Bernoulli random variable. Rather than relying on discrete point-cloud perturbations, PROBE analytically marginalizes over continuous Cartesian translations via the polar Jacobian, yielding a distance-adaptive angular uncertainty$\sigma \_\theta = \sigma \_{t} / r$in$\mathcal {O}(R{\cdot }S)$time. The primary parameter$\sigma \_{t}$represents the expected translational uncertainty in meters, a sensor-independent physical quantity that enhances cross-sensor generalization while reducing the need for extensive per-dataset tuning. Pairwise similarity combines aBernoulli-KL Jaccardwith exponential uncertainty gating and FFT-based height cosine similarity for rotation alignment. Evaluated on four datasets spanning four diverse LiDAR types, PROBE achieves the highest accuracy among handcrafted descriptors in multi-session evaluation and competitive single-session performance relative to both handcrafted and supervised baselines. The source code and supplementary materials are available athttps://sites.google.com/view/probe-pr.

RA-L 2026-06-12

Token Expand-Merge: Training-Free Token Compression for Vision-Language-Action Models

Yifan Ye, Jiaqi Ma, Jun Cen, Zhihe Lu

机器人学习感知与传感
摘要

Vision-Language-Action (VLA) models pretrained on large-scale multimodal datasets have emerged as powerful foundations for robotic perception and control. However, their massive scale, often billions of parameters, poses significant challenges for real-time deployment, as inference becomes computationally expensive and latency-sensitive in dynamic environments. To address this, we propose Token Expand-and-Merge-VLA (TEAM-VLA), a training-free token compression framework that accelerates VLA inference while preserving task performance. TEAM-VLA introduces a dynamic token expansion mechanism that identifies and samples additional informative tokens in the spatial vicinity of attention-highlighted regions, enhancing contextual completeness. These expanded tokens are then selectively merged in deeper layers under action-aware guidance, effectively reducing redundancy while maintaining semantic coherence. By coupling expansion and merging within a single feed-forward pass, TEAM-VLA achieves a balanced trade-off between efficiency and effectiveness, without any retraining or parameter updates. Extensive experiments on LIBERO benchmark demonstrate that TEAM-VLA consistently improves inference speed while maintaining or even surpassing the task success rate of full VLA models.

RA-L 2026-06-12

Learning Visually Interpretable Oscillator Networks for Soft Continuum Robots from Video

Henrik Krauss, Johann Licher, Naoya Takeishi, Annika Raatz, Takehisa Yairi

医疗 / 软体 / 微纳控制与动力学
摘要

Learning soft continuum robot (SCR) dynamics from video offers flexibility but existing methods lack interpretability or rely on prior assumptions. Model-based approaches require prior knowledge and manual design. We bridge this gap by introducing: (1) The Attention Broadcast Decoder (ABCD), a plug-and-play module for autoencoder-based latent dynamics learning that generates pixel-accurate attention maps localizing each latent dimension's contribution while filtering static backgrounds, enabling visual interpretability via spatially grounded latents and on-image overlays. (2) Visual Oscillator Networks (VONs), a 2D latent oscillator network coupled to ABCD attention maps for on-image visualization of learned masses, coupling stiffness, and forces, thereby enabling mechanical interpretability. We validate our approach on single- and double-segment SCRs, demonstrating that ABCD-based models significantly improve multi-step prediction accuracy with 5.8$times$error reduction for Koopman operators and 3.5$times$for oscillator networks on a two-segment robot. VONs autonomously discover a chain structure of oscillators. This fully data-driven approach yields compact, mechanically interpretable models with potential relevance for future control applications.

RA-L 2026-06-12

Continuum Robot State Estimation with Actuation Uncertainty

James M. Ferguson, Alan Kuntz, Tucker Hermans

操作与机械臂医疗 / 软体 / 微纳
摘要

Continuum robots are flexible, slender manipulators well suited for confined surgical environments. In these settings, unknown interaction forces and model uncertainty significantly affect robot shape, motivating state estimation from external observations. Existing estimation methods either neglect actuation modeling or rely on simplified deterministic actuation models. In contrast, we jointly estimate robot shape, external loads, and actuation inputs using mechanically principled actuation priors. To achieve this, we present a discrete Cosserat rod formulation with piecewise-linear strain integration that provides high numerical accuracy while inducing a sparse factor graph structure for efficient nonlinear optimization. We extend the framework to tendon-driven and parallel robots in simulation and validate it experimentally on a surgical concentric tube robot. Overall, our approach enables principled real-time estimation across multiple robot architectures while providing direct access to manipulator Jacobians through the linearized factor graph.

RA-L 2026-06-12

CIRNet: Compact Iterative Refinement Network for Real-time Stereo Matching

Haopeng Wang, Zhanhong Chen, Yu Zhang, Yanbiao Sun, Jigui Zhu

导航 / SLAM / 自动驾驶感知与传感
摘要

Stereo matching is crucial for 3D perception in autonomous driving and generalist robotics. Recent advancements transitioned from cost volume method to iterative refinement approach, offering superior details and spatial consistency. However, iterative methods remain constrained by expensive matching encoding and numerous refinement iterations, limiting latency-sensitive practical application. To unleash the accuracy superiority of iterative paradigms in real-time systems, we present Compact Iterative Refinement Network (CIRNet). Specifically, we design Compact Correlation Encoding (CCE) to allocate essential matching evidence by functional attributes, constructing global geometric context at deep level and fine-grained local similarity at shallow level to reduce redundant computation and provide reliable coarse initialization. Meanwhile, we introduce Staged Iterative Refinement (SIR) using early coarse correction for large residuals and a lightweight shared operator for fine details, improving the accuracy-latency trade-off and reaching near-saturated accuracy at lower latency. Evaluations on SceneFlow and KITTI show CIRNet sets new state-of-the-art in balancing accuracy and efficiency. Furthermore, CIRNet exhibits robust cross-domain generalization on multiple real-world datasets and practical deployability on embedded edge platform.

RA-L 2026-06-12

Insect-Scale Magnetic Wheel-Based Climbing Robot with Micro Ultrasonic Motors and 3D Printed Planetary Gears

Takuro Akadochi, Mohamed M. Khalil, Tomoaki Mashimo

足式 / 四足机器人导航 / SLAM / 自动驾驶
摘要

Miniaturization of climbing robots is of interest for accessing narrow and complex structures that are inaccessible to conventional robots. This paper presents the design, fabrication, and experimental validation of an insect-scale climbing robot with magnetic wheels. The robot integrates two key technologies: a micro ultrasonic motor, which is one of the smallest electricity-driven motors, and a micro planetary gear train with a gear ratio of 64, fabricated using a micro-stereolithography 3D printer. The combination of these technologies enables the miniaturization of climbing robots. With a differential steering mechanism driven by two geared motors, the prototype robot measures 13 mm×13 mm and weighs 0.77 g. To obtain a suitable magnetic adhesion force, the magnetic wheels are designed and optimized to ensure reliable locomotion on vertical surfaces. Experiments demonstrate that the robot can climb vertically while carrying a payload exceeding five times its own weight. The steerability tests confirm the robot's ability to perform turns along a desired trajectory, although some errors remain. These experiments have demonstrated the feasibility of the smallest magnet-wheel-based climbing robot, providing insights for future studies on micro-scale robotic exploration in constrained environments.

RA-L 2026-06-12

Reliable Range-Based Relative Localization Under Interval Excitation

Yue Wang, Qingkai Yang, Hao Cui, Hao Fang

导航 / SLAM / 自动驾驶多机器人 / 集群
摘要

Reliable relative localization using only onboard sensing is challenging for multi-robot systems operating in GPS-denied environments, particularly when robots exhibit limited mutual motion, under which the available measurements cannot provide persistently excited regression signals. To address this challenge, this letter proposes an adaptive online relative localization method that enables accurate relative-position estimation using only onboard measurements under a mild interval excitation (IE) condition. By introducing two dynamic auxiliary variables, the relative localization problem is reformulated as an online parameter estimation framework, which guarantees globally exponential convergence without persistent excitation (PE). Furthermore, an explicit lower bound on the estimation gain is provided in terms of the strength and duration of the interval excitation induced by the robots' relative motion. Finally, simulations and physical experiments demonstrate that the proposed method achieves reliable localization performance under both interval and persistently excited motions.

RA-L 2026-06-12

Learning from Mistakes: Loss-Aware Memory Enhanced Continual Learning for LiDAR Place Recognition

Xufei Wang, Junqiao Zhao, Siyue Tao, Qiwen Gu, Wonbong Kim, Tiantian Feng

导航 / SLAM / 自动驾驶感知与传感
摘要

LiDAR place recognition plays a crucial role in SLAM, robot navigation, and autonomous driving. However, existing LiDAR place recognition methods often struggle to adapt to new environments without forgetting previously learned knowledge, a challenge widely known as catastrophic forgetting. To address this issue, we proposeKDF+, a novel continual learning framework for LiDAR place recognition that extends the KDF paradigm with a loss-aware sampling strategy and a rehearsal enhancement mechanism. The proposed sampling strategy estimates the learning difficulty of each sample via its loss value and selects samples for replay according to their estimated difficulty. Harder samples, which tend to encode more discriminative information, are sampled with higher probability while maintaining distributional coverage across the dataset. In addition, the rehearsal enhancement mechanism encourages memory samples to be further refined during new-task training by slightly reducing their loss relative to previous tasks, thereby reinforcing long-term knowledge retention. Extensive experiments across multiple benchmarks demonstrate that KDF+ consistently outperforms existing continual learning methods and can be seamlessly integrated into state-of-the-art continual learning for LiDAR place recognition frameworks to yield significant and stable performance gains. The code is available athttps://github.com/Thunder-Volcano/KDF-plus.

RA-L 2026-06-12

Partially Observable Adversarial Patch Attacks on Vision-Language-Action Models in Robotics

Xiaofei Wang, Mingliang Han, Tianyu Hao, Yi Yang, Yun-Bo Zhao, Keke Tang

机器人学习感知与传感
摘要

Vision-language-action (VLA) models are gaining attention in robotics, yet their robustness to adversarial attacks remains largely unexplored. Existing work shows that adversarial patches can mislead VLA-based robots but assumes full access to the entire execution trajectory, an unrealistic requirement in practice. We address this limitation by formulating a partially observable threat model, where the adversary can exploit only a short prefix of the trajectory to generate a fixed patch applied to all subsequent frames. Under this setting, we propose a twophase framework. First, we localize the patch using the model's attention maps to identify visually critical regions that correspond to the full instruction. Then, we optimize the patch to disrupt the semantic grounding of target objects and increase the curvature of action trajectories, thereby compounding failures in both perception and control. Extensive experiments in simulation and real-world robotic environments show that our method sustains adversarial effects under partial observability, inducing longhorizon disruptions and significantly reducing task success rates.

RA-L 2026-06-05

Decentralized End-to-End Multi-AAV Pursuit Using Predictive Spatio-Temporal Observation via Deep Reinforcement Learning

Yude Li, Zhexuan Zhou, Huizhe Li, Yanke Sun, Yenan Wu, Yichen Lai, et al.

无人机 / 空中机器人机器人学习感知与传感多机器人 / 集群
摘要

Decentralized cooperative pursuit in cluttered environments is challenging for autonomous aerial swarms, especially under partial and noisy perception. Existing methods often rely on abstracted geometric features or privileged ground-truth states, and therefore sidestep perceptual uncertainty in real-world settings. We propose a decentralized end-to-end multi-agent reinforcement learning (MARL) framework that maps raw LiDAR observations directly to continuous control commands. Central to the framework is the Predictive Spatio-Temporal Observation (PSTO), an egocentric grid representation that aligns obstacle geometry with predictive adversarial intent and teammate motion in a unified, fixed-resolution projection. Built on PSTO, a single decentralized policy enables agents to navigate static obstacles, intercept dynamic targets, and maintain cooperative encirclement. Simulations demonstrate that the proposed method achieves superior capture efficiency and competitive success rates compared to state-of-the-art learning-based approaches relying on privileged obstacle information. Furthermore, the unified policy scales seamlessly across different team sizes without retraining. Finally, fully autonomous outdoor experiments validate the framework on a quadrotor swarm relying on only onboard sensing and computing. Project details are available athttps://hitsz-mas.github.io/psto-aav-pursuit/.

T-RO 2026-05-22

Kinematically Constrained Marching for Optimal Reeds–Shepp Nonholonomic Path Planning on 2-D Cartesian Grids

Ibrahim Ibrahim, Wilm Decré, Jan Swevers

导航 / SLAM / 自动驾驶
摘要

We present an effective solution for computing locally optimal Reeds-Shepp distances and paths for kinematically-constrained vehicles in environments represented as obstacle-rich 2D Cartesian occupancy grids, addressing the reliance of current methods on discretization and approximation techniques. Our solution leverages a visibility-based marching architecture with continuous analytical expressions for propagating the Reeds-Shepp distance function. We introduce a model to identify reachable and unreachable regions for Reeds-Shepp vehicles, accompanied by a comprehensive representation of the Reeds-Shepp distance function in both cases. Unlike existing approaches, our method computes locally optimal distances and smooth paths globally without discretizing vehicle orientations, motion primitives, or the PDE, and without gradient descent (GD) backtracking, ensuring both accuracy and computational efficiency. Extensive simulations in various environments demonstrate effective improvements over state-of-the-art methods, particularly in complex obstacle-rich scenarios. To facilitate adoption, we provide an open-source solver implemented in C++.

RA-L 2026-06-16

Latent Gaussian Splatting for 4D Panoptic Occupancy Tracking

Maximilian Luz, Rohit Mohan, Thomas Nürnberg, Yakov Miron, Daniele Cattaneo, Abhinav Valada

导航 / SLAM / 自动驾驶
摘要

Capturing 4D spatiotemporal scene structure is crucial for the safe and reliable operation of robots in dynamic environments. However, existing approaches typically address only part of the problem: they either provide coarse geometric tracking via bounding boxes or detailed 3D occupancy estimates that lack explicit temporal association and instance-level reasoning. In this work, we present Latent Gaussian Splatting (LaGS) for 4D Panoptic Occupancy Tracking (4D-POT). We revisit the underlying representation and model 3D features as a sparse set of feature-bearing Gaussians. These act as dynamic, volume-oriented keypoints that enable spatially continuous, distance-weighted aggregation of multi-view features before being splatted into a voxel grid for decoding. This point-centric formulation enables flexible, data-dependent receptive fields and long-range spatial interactions that are difficult to capture with local and dense voxel-based operators. A hierarchical Gaussian representation further enables multi-scale reasoning by combining global context from coarse super-points with fine-grained detail from higher-resolution streams. Extensive experiments on Occ3D nuScenes and Waymo demonstrate state-of-the-art performance for 4D-POT. We provide code and models at https://lags.cs.uni-freiburg.de/.

RA-L 2026-06-04

Lie-Algebraic Approach to Geometric Nonlinear $\mathcal {H}_\infty$ Inverse Optimal Attitude Tracking on $SO(3)$

Junsik Kim, Youngjun Joo, Youngjin Choi

无人机 / 空中机器人操作与机械臂控制与动力学
摘要

This paper presents a geometric nonlinear$\mathcal {H}\_\infty$inverse optimal attitude tracking framework formulated on$SO(3)$using its Lie algebra$\mathfrak {so}(3)$. By adopting exponential coordinates${\xi }\in \mathfrak {so}(3)$, attitude errors are mapped directly onto the Lie algebra. This preserves a bijective, distortion-free relationship with the geodesic distance within the injectivity radius ($\Vert {\xi }\Vert < \pi$), while eliminating singularities and unwinding issues inherent in Euler angles and quaternions. A sliding-mode-inspired nonlinear$\mathcal {H}\_\infty$controller is developed, with robust stability established using the exponential coordinates and a systematic tuning methodology provided to facilitate practical implementation. Quadrotor simulations demonstrate faster convergence in large-angle maneuvers by mitigating the vanishing error and sluggish response of conventional methods. Experiments on a robotic manipulator validate the tuning rule for the prescribed disturbance attenuation level$\gamma$, showing that the peak tracking error scales as$\mathcal {O}(\gamma ^{2})$for small gains and$\mathcal {O}(\gamma)$for large gains. These results demonstrate the robustness and predictability of the proposed framework and the effectiveness of Lie algebra-based attitude control on$SO(3)$.

RA-L 2026-06-04

Toward Reliable Vision-Guided Suction-and-Cut for Robotic Leaf Sampling in Arabidopsis Plants

Jiarui Chang, Weiwei Wan, Ryoichi Sato, Miki Fujita, Nobuyuki Tanaka, Masami Hirai, et al.

操作与机械臂导航 / SLAM / 自动驾驶感知与传感
摘要

Automated leaf sampling is essential for high-throughput genotype-phenotype mapping but remains challenging for small plants, such as Arabidopsis thaliana, that have dense canopies, tiny leaves (6$\sim$10mm) and slender petioles (0.8$\sim$1 mm). Such morphological traits demand high positional accuracy and specialized manipulation to reliably isolate individual leaves. To address these challenges, we present a fully automated robotic sampling system that combines a suction-cuttingend-effector with a two-stage vision framework for coarse-to-fine cutting-point alignment. The end-effector is driven by a non-interfering four-bar linkage, enabling compact motion and minimal disturbance to surrounding foliage. The control strategy iteratively replans motions based on local camera feedback until high-confidence alignment is achieved. Experiments across varied growth stages of Arabidopsis demonstrate high success rates with minimal disturbance, while ablation studies confirm the effectiveness of the selected perception methods. The system offers a reliable and precise automatic solution for plant genotype-phenotype mapping and broader laboratory automation workflows.

RA-L 2026-06-04

RLRC: Reinforcement Learning-Based Recovery for Compressed Vision-Language-Action Models

Yuxuan Chen, Yixin Han, Yize Huang, Xiao Li

操作与机械臂机器人学习感知与传感
摘要

Vision-Language-Action models (VLA) have demonstrated remarkable capabilities and strong potential in complex robotic manipulation. However, their large parameter sizes and high inference latency hinder real-world deployment, especially on resource-constrained platforms. To address this, we conduct a systematic empirical study of model compression for VLAs. Building on these insights, we presentRLRC, a three-stage compression and recovery pipeline consisting of structured pruning, performance recovery via SFT and RL, and subsequent quantization. The RL stage incorporates a critic warm-up strategy and BC loss regularization to stabilize training and preserve policy behavior. RLRC achieves up to an 8× memory reduction and 2.3× inference speedup while maintaining the original task success rate. Extensive experiments across multiple VLA backbones show that RLRC consistently outperforms existing compression baselines, highlighting its effectiveness for on-device deployment.

RA-L 2026-06-04

Informative Planning With Attention-Based Hybrid Belief Reinforcement Learning for Aerial Multi-Target Search and Tracking

Zhengyu Hua, Yike Wu, Yuwei Li, Li Xing, Jidong Huang, Peng Li, et al.

无人机 / 空中机器人机器人学习感知与传感控制与动力学
摘要

Autonomous multi-target search and tracking (SaT) under limited sensing fields of view requires balancing the maintenance of precise estimates for visible targets with the recovery of those lost to intermittent observations. To address this, we present HyBE-RL, an integrated active perception framework that synergizes a hybrid belief estimator (HyBE) with an attention-driven reinforcement learning planner. The estimation layer employs a visibility-dependent mechanism to leverage negative information from non-detections, coupling parametric tracking with non parametric search to maintain effective priors during observation gaps. Building on this representation, the planner utilizes a spatial-uncertainty attention mechanism to map continuous belief directly to optimal control actions, explicitly overcoming the discretization artifacts and heuristic sub-optimality inherent in conventional geometric planning. To facilitate robust learning, we incorporate a sampling-based geometric planner to provide expert-guided reward shaping and function as a runtime safety shield. Comprehensive simulations and real-world UAV experiments validate that the proposed approach not only outperforms standard baselines but also surpasses its geometric teacher in complex environments, achieving reduced tracking errors and superior target recovery.

RA-L 2026-06-04

MO-Playground1: Massively Parallelized Multi-Objective Reinforcement Learning for Robotics

Neil Janwani, Ellen Novoseller, Vernon J. Lawhern, Maegan Tucker

人形机器人足式 / 四足机器人机器人学习
摘要

Multi-objective reinforcement learning (MORL) is a powerful tool to learn Pareto-optimal policy families across conflicting objectives. However, unlike traditional RL algorithms, existing MORL algorithms do not effectively leverage large-scale parallelization to concurrently simulate thousands of environments, thus facing vastly increased computation time. Ultimately, this has limited the application of MORL towards complex multi-objective robotics problems. To address these challenges, we present 1) MORLAX, a new GPU-native, fast MORL algorithm, and 2) MO-Playground, apip-installable playground of GPU-accelerated multi-objective environments. Together, MORLAX and MO-Playground approximate Pareto sets within minutes, offering 26-271x speed-ups compared to legacy CPU-based approaches and up to 19x speed-ups over prior GPU-based approaches whilst learning superior Pareto front hypervolumes. We demonstrate MO-Playground's versatility by implementing a custom BRUCE humanoid robot environment and learning Pareto-optimal locomotion policies across 6 practical objectives in simulation, such as smoothness, efficiency and arm swinging.

T-RO 2026-05-27

Environmental Adaptation Enabled by an Amplitude-Tunable Traveling Wave Robot With a Soft Corkscrew (ATWBot)

Qinjie Ji, Aiguo Song, Sareum Kim, Josie Hughes

摘要

Amplitude tuning is an important strategy in animals employing traveling wave patterns, enhancing their adaptability to unstructured environments. This paper proposes, for the first time, an amplitude tuning method that leverages the compliance of a soft corkscrew by twisting its ends. An amplitude tunable traveling wave robot (ATWBot) is developed, consisting of a soft corkscrew housed in a high DOF cage and driven by only two servos. The soft corkscrew and cage are monolithically 3D-printed. ATWBot achieves a wide range of active amplitude tuning with passive compliance adaptation, and can extend its morphology to a coiled configuration, enabling clamping and rolling. A comprehensive model is built for the twisted soft corkscrew geometry, proving that the robot's speed is decoupled from amplitude variations during twisting. A genetic algorithm is used to optimize the soft corkscrew for achieving the fastest speed while matching the cage geometry. Experiments demonstrate that the combination of active amplitude tuning and passive body compliance enables the robot to adapt to unstructured terrains including slits, steps, gaps, converging tunnels, slopes, and swimming.

RA-L 2026-06-15

Integrated by Design: A Hybrid Robotic Palm Fusing a Functionally-Dense Compliant Matrix with an Actuated Skeleton

Oliver S. Neumann, Lena S. Ewering, Robert K. Katzschmann

操作与机械臂
摘要

Inspired by the human hand, this work presents a hybrid soft-rigid architecture for a robotic palm to enhance dexterity and robustness. While soft-robotic hands often lack the structural integrity to handle heavy objects, classical rigid designs are sensitive to impacts and frequently neglect the palm's functional role. We bridge that gap by combining soft compliance with skeletal elements for robust force transmission. In our design, rigid metacarpal bones house local actuation and transmission units. These units receive torque from forearm-based motors via flexible shafts to drive tendon actuation. We embed these elements in a 3D-printed silicone matrix, which eliminates the design restrictions of traditional moulding. Ball joints for the thumb and little finger enable the formation of three palmar arches. In experiments, the palm achieved active and passive deformations of 74% and 67%, respectively, relative to its neutral width. This shape adaptability offers practical benefits in confined spaces, where internal palm movements enable new grasping and manipulation techniques previously difficult to achieve with traditional designs.

RA-L 2026-06-15

Height-Aware Navigation Framework for Single-DOF Deformable Mobile Robots Based on 3D Encoding

Feiqiang Wang, Yukuan Chen, Haobo Huang, Zufeng Shang, Jun Zhang, Fufu Yang

导航 / SLAM / 自动驾驶
摘要

A deformable mobile robot with single-degree-of freedom (single-DOF) can autonomously navigate and avoid obstacles in complex environments by altering its three dimensional dimensions through deformation. This capability allows it to traverse passages that fixed-size robots cannot, such as height-restricted or narrow channels, thereby improving task efficiency. This paper proposes a height-aware navigation framework that projects the varying 3D geometric constraints into a grid map and costmap, enabling coupled motion deformation planning for single-DOF robots. To achieve this, the height restrictions in complex three-dimensional environments are projected as grayscale information into a two-dimensional grid map. Using these grayscale values, we design a costmap layer to reflect the robot's footprint dimensions in different deformation states. An A* algorithm with height encoding then obtains a global path containing the robot's height constraints. Finally, an enhanced Dynamic Window Approach (DWA) algorithm is proposed to calculate the robot's movement and deformation speeds, enabling it to utilize its single-DOF deformation capability to traverse height-restricted or narrow obstacle passages. To validate the effectiveness of each core module and highlight the advantages over traditional methods, simulations and experiments were conducted using a Miura-ori based single-DOF deployable robot.

T-RO 2026-05-08

Analysis and Mitigation of Pose Estimation Uncertainty on SE(3) for Magnetic Localization

Pingyu Xiang, Hongye Zhang, Yue Wang, Rong Xiong, Haojian Lu

导航 / SLAM / 自动驾驶感知与传感医疗 / 软体 / 微纳
摘要

Magnetic localization, owing to its immunity to line-of-sight occlusion and non-contact nature, is considered a promising technology for medical applications. While it is intuitive that localization performance degrades as the target moves farther from the sensor array, uncertainty analysis has long been overlooked, which is essential for quantifying localization quality. In this work, we present a pose estimation and uncertainty analysis framework on SE(3) for magnetic source localization using sensor arrays, which enables concise formulation of the problem and quantitative assessment of the results. The volume of the uncertainty ellipsoid is used to characterize localization confidence, while the surface shape in Cartesian space is used for visualization. This also provides insight into the effective workspace of the magnetic localization system prior to deployment. Guided by this analysis, we designed a movable magnetic sensor array to expand the limited sensing volume and mitigate localization uncertainty, thereby enhancing overall localization performance. Simulations and pose tracking experiments validate the effectiveness of this framework. By dynamically moving the sensor array to minimize the volume of the uncertainty ellipsoid, localization errors are reduced compared with static and projection-based strategies by 67.04%, 16.51% in position and 43.87%, 20.73% in orientation, respectively. Furthermore, phantom experiments on distal locking screw alignment and magnetic capsule endoscope tracking demonstrate the system's capability in improving localization accuracy (reducing alignment errors by 71.88%) and expanding the effective workspace by several folds.

Sci. Robotics 2026-03-25 · 被引 1

Electrofluidic fiber muscles

O. K. Afsar, G. Pupillo, G. Vitucci, W. Babatain, H. Ishii, V. Cacucciolo

控制与动力学
摘要

Actuators are to robots what muscles are to humans. They enable motion and determine strength and dexterity. The fiber form factor makes skeletal muscles modular, scalable, and densely integrated (50% of human body weight). In contrast, servo motors that drive today’s robots lack the flexibility and modularity of muscle fibers, limiting integration and dexterity. Here, we report electrofluidic fiber muscles, soft artificial muscles for robotic applications with power density comparable to skeletal muscles (50 watts per kilogram), contraction strains of 20%, and response time of 0.3 second. These 2-millimeter-thick muscles comprise antagonistic fluidic actuators driven by electrohydrodynamic fiber pumps in a closed circuit. They require no external liquid reservoir and are electrically driven, untethered, and silent. We demonstrated that performance is increased by pre-pressurizing the muscles at an optimal bias pressure. Applying bias pressure allowed the antagonist actuator to act as a reservoir for the agonist, enabled 200% higher operating voltages by preventing cavitation, and leveraged the nonlinear pressure-stroke response of the actuators, increasing strain threefold at a given pump pressure. We characterized and modeled their dynamics, identifying optimal bias pressures. Electrofluidic muscles scale by simply bundling fibers. By selecting the ratio between pumps and actuators, we programmed their performance for different robotic tasks: a fast lever (180 millimeters per second) that launches objects in <0.3 second; a strong bundle that lifts 4 kilograms (200 times its weight) with a 30-millimeter stroke; a woven muscle that bends a robot arm by 40° and is compliant enough for a human handshake.

RA-L 2026-06-08

Hybrid Consistency Policy: Decoupling Multi-Modal Diversity and Real-Time Efficiency in Robotic Manipulation

Qianyou Zhao, Yuliang Shen, Xuanran Zhai, Duidi Wu, Jin Qi, Ce Hao, et al.

操作与机械臂机器人学习
摘要

In visuomotor policy learning, diffusion-based imitation learning has become widely adopted for its ability to capture diverse behaviors. However, approaches built on ordinary and stochastic denoising processes struggle to jointly achieve fast sampling and strong multi-modality. To address these challenges, we propose the Hybrid Consistency Policy (HCP). HCP runs a short stochastic prefix up to an adaptive switch time, and then applies a one-step consistency jump to produce the final action. To align this one-jump generation, HCP performs time-varying consistency distillation that combines a trajectory-consistency objective to keep neighboring predictions coherent and a denoising-matching objective to improve local fidelity. In both simulation and on a real robot, HCP with 25 SDE steps plus one jump approaches the 80-step DDPM teacher in accuracy and mode coverage while significantly reducing latency. These results show that multi-modality does not require slow inference, and a switch time decouples mode retention from speed. It yields a practical accuracy–efficiency trade-off for robot policies. Project website:https://sites.google.com/view/hybrid-cp.

RA-L 2026-06-08

Force Estimation With Concentric-Tube Robots for Surgical Palpation

Yufei Wu, Joshua Gaston, Samuel Tobin, Lauren Branscombe, Caleb Rucker

导航 / SLAM / 自动驾驶医疗 / 软体 / 微纳
摘要

This paper presents the design, modeling, and experimental verification of a deflection-based force-sensing probe in a concentric-tube robot for palpation during robotic surgery. The proposed method leverages the inherent elastic compliance of the robot and a quasi-static Cosserat rod model to estimate contact forces from real-time tip deflections measured by a 5-DOF magnetic tracker, eliminating the need for distal force sensors. The system was experimentally validated on the Virtuoso Endoscopy System (Virtuoso Surgical, Nashville, TN) across multiple palpation locations and directions, including experiments on compliant tissue phantoms. The proposed method achieved a mean absolute force error of approximately 0.05 N, evaluated using a leave-one-group-out cross-validation scheme across palpation locations. Simulation studies using a real-time deformable-tissue model further demonstrate the tool's potential for automated tissue stiffness mapping, which is further demonstrated by stiffness mapping a tissue phantom to localize an embedded simulated tumor. These results establish a compact, sensor-efficient, and accurate framework for model-based force sensing in continuum robotic palpation, providing a foundation for future clinical applications in minimally invasive surgery.

RA-L 2026-06-08

Balancing Holding Torque and Dynamic Performance: Air-Lubricated, Friction-Utilizing Shoulder Joint for PAM-Driven Humanoids

Haruka Uehara, Takumi Kawasetsu, Tomoko Hirayama, Koh Hosoda

人形机器人控制与动力学
摘要

Musculoskeletal humanoids driven by pneumatic artificial muscles (PAMs) offer high adaptability to environmental interactions, but replicating anatomical joint structures poses stability challenges, particularly at the shoulder. To prevent dislocation in anthropomorphic ball-and-socket joints, contact between the ball and socket must be maintained by antagonistic muscle tension. The resulting friction impedes motion and degrades dynamic performance; however, it also provides holding torque to resist external forces during position holding. This paper proposes an air-lubricated, friction-utilizing shoulder joint to balance holding torque and dynamic performance. In experiments, we compared the proposed joint against a low friction, rolling contact joint. The results demonstrated that the proposed joint achieved a holding torque 1.3 times greater than that of the rolling contact joint. Furthermore, air lubrication mitigated the friction-induced decline in dynamic performance, increasing the effective range of motion by up to 1.6 times and reducing the settling time to one-fifth of the value without air lubrication.

Sci. Robotics 2026-04-29

How foundation models will revolutionize robot swarms

Volker Strobel, Marco Dorigo, Mario Fritz

机器人学习多机器人 / 集群
摘要

Robot swarms are composed of many, typically simple, robots that accomplish complex tasks through local communication and decentralized coordination. Traditionally, robot controllers are designed before a mission using programming code. This process requires substantial development effort and limits the flexibility of the swarm. We discuss how onboard foundation models (FMs) could revolutionize this process through two complementary approaches. The first approach uses FMs as swarm designers to synthesize robot controllers and perform high-level planning. The second approach uses FMs as swarm operators to facilitate robot-robot collaboration and human-swarm interaction.

Sci. Robotics 2026-04-29

Dexterous grasping with an active palm

Amos Matsiko

操作与机械臂感知与传感
摘要

A tactile-responsive gripper with an active palm enables adaptive grasping and dexterous manipulation of objects.

RA-L 2026-06-01

ORION: Option-Regularized Deep Reinforcement Learning for Cooperative Multi-Agent Online Navigation

Shizhe Zhang, Jingsong Liang, Zhitao Zhou, Shuhan Ye, Yizhuo Wang, Derek Ming Siang Tan, et al.

导航 / SLAM / 自动驾驶机器人学习感知与传感多机器人 / 集群
摘要

Existing methods for multi-agent navigation typically assume fully known environments, offering limited support for partially known scenarios with outdated or imperfect prior maps, such as warehouses or factory floors. There, agents need to balance path optimality with collecting and sharing environmental information to help teammates reach their own targets. To these ends, we proposeORION, a novel deep reinforcement learning framework for cooperative multi-agent online navigation in partially known environments. Starting from an imperfect prior map, ORION trains agents to make decentralized decisions, coordinate toward individual targets, and actively reduce task-relevant map uncertainty through online observation sharing in a closed perception—action loop. We first design a shared graph encoder that fuses prior map with online perception into a unified representation, providing robust state embeddings under environmental discrepancies. At the core of ORION is an option-critic framework that learns high-level cooperative modes translated into sequences of low-level actions, enabling adaptive switching between individual navigation and team-level exploration. We further introduce a dual-stage cooperation strategy that allows agents to assist teammates under map uncertainty, thereby reducing the overall makespan. Across extensive maze-like maps and large-scale warehouse environments, ORION achieves high-quality real-time decentralized cooperation while scaling to up to 10 robots, outperforming state-of-the-art classical and learning-based baselines. Finally, we validate ORION on physical robot teams, demonstrating its robustness and practicality for real-world cooperative navigation.

RA-L 2026-06-01

Prompt Optimization Through Reinforcement Learning for Generative Language Model Code Synthesis in Multi-Robot Systems

Kaushik Kannan, Jungyun Bae

导航 / SLAM / 自动驾驶机器人学习多机器人 / 集群
摘要

In multi-robot systems (MRS) operating across various applications, real-time task allocation and path planning pose significant challenges, often requiring extensive human intervention under extreme time constraints. This paper introduces a novel framework that leverages Reinforcement Learning (RL) to automate and optimize the code generation process for MRS. Our approach trains an RL agent to dynamically generate optimized prompts for a Large Language Model (LLM). By refining prompt structures, templates, and parameters, the RL agent guides the LLM to produce efficient, feasible, and directly executable code for complex task allocation and path-planning problems. We demonstrate that the generated solutions are both complete and high-performing. The proposed method is evaluated on a sample case of search-and-rescue scenarios, using GPT-4.1 as the LLM, and demonstrates significant performance improvements over previous work that used manual prompt optimization for similar applications. This work represents a significant step towards autonomous multi-robot coordination in time-critical dynamic environments, reducing human workload and improving mission efficacy.

RA-L 2026-06-01

AutoPath: Learning Transferable Goal-Conditioned Stochastic Path Prior for Safe Navigation Without Human Demonstrations

Ziyang Zhang, Boyang Zhou, Zesong Yang, Haocheng Peng, Zeming Gai, Xiao Liang, et al.

足式 / 四足机器人导航 / SLAM / 自动驾驶感知与传感
摘要

Real-time navigation in cluttered and dynamic environments requires collision-free and dynamically feasible motion under limited perception. However, feasible navigation behaviors are inherently multimodal because multiple paths may exist around obstacles. In this paper, we formulate navigation as learning a transferable goal-conditioned stochastic path prior that models a reusable distribution over goal-aligned geometry consistent local paths conditioned on local observations. This formulation enables structured sampling of navigation candidates, allowing multiple feasible paths to be explored through sampling without relying on robot-specific motion constraints. To this end, we introduce a goal-aligned canonical state representation that removes in-plane rotational ambiguity and normalizes local geometry with respect to the goal, enabling rotation-invariant path distribution learning. We further develop a structured prior learning framework that parameterizes local paths using a geometry-aware polar action manifold and incorporates risk sensitive utility shaping with multi-goal distributional rollouts for stable and safety-aware planning. Extensive experiments in dense static environments and dynamic pedestrian scenarios demonstrate that the proposed method achieves consistently high success rates with competitive efficiency while enabling cross platform transfer of a single path prior learned on differential drive robots to quadruped platforms without retraining.

RA-L 2026-06-01

DST-Planner: Diffusion-Augmented Sequential Topological Planning for Robotic Exploration in Complex Environments

Yiqing Yuan, Zhi Li, Jiahui Liang, Peiming Duan, Xiaoxun Zhang, Jihe Guo, et al.

无人机 / 空中机器人导航 / SLAM / 自动驾驶机器人学习控制与动力学
摘要

Robotic exploration in large-scale unstructured 3D environments presents significant challenges, requiring efficient long-horizon reasoning and robust decision-making. Traditional heuristic-based planners often struggle with local minima and lack scalability, while conventional online Reinforcement Learning (RL) approaches face prohibitive sample inefficiency and training instability. To overcome these challenges, we present DST-Planner, a novel framework that integrates 3D topological reasoning with offline diffusion-based reinforcement learning. Specifically, we leverage an incremental 3D skeleton graph to abstract complex environments into lightweight connectivity representations. We then introduce a novel diffusion critic architecture to accurately capture the multi-modal value distribution of exploration actions for robust value estimation. Simultaneously, we design a sequence-based actor to generate long-horizon topological plans that promote temporal consistency and topological continuity. By optimizing the policy via an advantage-weighted offline RL objective, our method ensures stable learning from fixed datasets and preserves exploration diversity. Extensive experiments in diverse simulation environments demonstrate that DST-Planner significantly outperforms state-of-the-art (SOTA) baselines in coverage efficiency. Notably, we demonstrate the zero-shot transferability of our simulation-trained policy to both aerial and ground robots in the real world, confirming its robustness to drastically different robotic kinematics and motion constraints without fine-tuning.

RA-L 2026-06-01

A Compliant Force Control Strategy for On-Orbit Assembly: Design, Implementation, and Reinforcement Learning-Based Optimization

Xinle Yan, Lingling Shi, Yong Hu, Yanan Liu, Minghe Shan

操作与机械臂机器人学习控制与动力学
摘要

On-orbit assembly of large-scale space structures is critical for future space missions, yet it poses significant challenges for robotic systems. A fundamental difficulty in tasks like docking, which are highly contact-rich, lies in achieving both high precision and compliant force control to prevent damage. This trade-off between accuracy and safety is difficult to resolve using conventional methods. Furthermore, the development of advanced solutions is constrained by the scarcity of high-fidelity ground testbeds capable of emulating the complex contact dynamics of walking and docking. To address these issues, this paper presents a comprehensive system-level solution. First, a novel seven-degree-of-freedom walking and assembly integrated robotic platform, along with a high-load-bearing docking interface, is developed to facilitate practical verification. Second, based on this platform, a novel hybrid compliant control framework is proposed, which integrates a low-level impedance controller with a high-level Model-Based Reinforcement Learning (MBRL) planner. Unlike passive compliance schemes, the MBRL agent leverages a learned dynamics model to predict interaction forces and proactively generate corrective force trajectory commands. This hierarchical architecture allows the system to accommodate geometric uncertainties and maintain continuous contact without triggering excessive force spikes. Experimental results validate the effectiveness of the proposed approach. Compared to traditional impedance control, the hybrid control strategy significantly reduces contact forces during docking while achieving a final high assembly accuracy. This work demonstrates a viable system-level solution, from hardware design to intelligent control, for advancing robotic on-orbit assembly.

RA-L 2026-06-01

Longitudinal Dynamics Modeling and Control of a Wheel-Legged Robot With Tilting Rotor Assistance

Zhiheng Dai, Xinglu Xia, Xiang He, Yingxun Wang

足式 / 四足机器人操作与机械臂控制与动力学
摘要

Benefiting from high-efficiency mobility, wheel-inverted pendulum robots have been widely deployed in scenarios such as cargo transportation and search. However, their longitudinal motion exhibits non-minimum-phase behavior, which can induce inverse motion and thus degrades stability in transient maneuvers such as start-up or braking. Meanwhile, the system is underactuated, which makes it difficult to flexibly coordinate robot attitude and wheel position. To address these issues, this paper applies tilting dual rotors to a wheel-legged robot platform to help improve longitudinal motion. The longitudinal dynamic model of this hybrid-configuration robot is derived and analyzed, and an equilibrium-based scheduled LQR controller is developed and implemented. Experimental results demonstrate that the integration of the tilting dual-rotor assembly reduces the maximum inverse-speed excursion by 57.14% and the peak leg pitch angle excursion by 37.4%, while also reducing the attitude settling time after disturbance by about 18%, enhancing transient ground-locomotion performance in the tested longitudinal maneuvers.

RA-L 2026-06-01

Geometrically Consistent Tactile Servoing via Hybrid Force-Position Control at the Center of Pressure

Sébastien Kleff, Lucas Joseph, Vincent Padois

操作与机械臂感知与传感人机交互 / 遥操作控制与动力学
摘要

In robotics, traditional force control lacks local contact information. Tactile sensors provide rich feedback on physical interaction, but remain difficult to integrate consistently into real-time control loops. In this paper, we show that hybrid force–position control, when expressed at the Center of Pressure (CoP), becomes geometrically consistent with tactile sensing. Based on this insight, we formulate a tactile servoing approach that allows explicit control of contact pose and force at the CoP. Unlike conventional tactile servoing techniques, which are tightly coupled to a specific sensor and often treat the contact wrench as a disturbance to be rejected, our approach relies on a generic and physically grounded feature space. We derive a hybrid force-position control law based on the Jacobian at the CoP that naturally decouples force and motion subspaces, ensuring geometric consistency during contact interaction. Real-world experiments on a robotic manipulator demonstrate robust contact maintenance and improved force tracking compared to standard image-based and pose-based controllers in a physical interaction regulation task.

RA-L 2026-06-01

HCDiff: Hierarchical Latent Constraint-Projected Diffusion Framework for Deformable Linear Objects Manipulation in Cluttered Environments

Yixiong Du, Lu Liu, Zhongkai Zhang

操作与机械臂机器人学习控制与动力学
摘要

Manipulating Deformable Linear Objects (DLOs) in cluttered environments is challenging due to their high dimensionality and complex constraints. In this paper, we propose a Hierarchical Constraint-projected Diffusion (HCDiff) framework that decomposes the problem into global constraint-aware planning and local robust control through a coarse-to-fine generative process. At the high level, we propose a Latent Control Barrier Function (LCBF) projection that is integrated directly into the score-based denoising process of a diffusion planner. By projecting the gradient flow within the latent manifold learned by a Graph AutoEncoder (GAE), our method biases the generation of subgoals toward collision-free and physically feasible configurations. At the low level, we develop a Goal-conditioned Diffusion Policy (GDP) for DLO manipulation that is trained solely on unstructured, task-agnostic play data. By offloading local control to this task-agnostic policy, our high-level planner only requires a small scale of task-specific human demonstrations, thereby significantly reducing the overall dependency on expensive expert data. Furthermore, we employ a dual GDP control strategy to enhance control robustness. Comprehensive experiments in 2D and 3D demonstrate that HCDiff outperforms multiple baselines in terms of task success rate and planning efficiency. Crucially, the framework exhibits superior adaptability, successfully navigating environments with modified obstacle geometries beyond the training distribution.

RA-L 2026-06-01

Self-Imitated Diffusion Policy for Efficient and Robust Visual Navigation

Runhua Zhang, Junyi Hou, Changxu Cheng, Qiyi Chen, Tao Wang, Wuyue Zhao

导航 / SLAM / 自动驾驶机器人学习控制与动力学
摘要

Diffusion policies (DP) have demonstrated significant potential in visual navigation by capturing diverse multi modal trajectory distributions. However, standard imitation learning (IL), which most DP methods rely on for training, often inherits sub-optimality and redundancy from expert demonstrations, thereby necessitating a computationally intensive “generate-then-filter” pipeline that relies on auxiliary selectors during inference. To address these challenges, we propose Self Imitated Diffusion Policy (SIDP), a novel framework that learns improved planning by selectively imitating a set of trajectories sampled from itself. Specifically, SIDP introduces a reward guided self-imitation mechanism that encourages the policy to consistently produce high-quality trajectories efficiently, rather than outputs of inconsistent quality, thereby reducing reliance on extensive sampling and post-filtering. During training, we employ a reward-driven curriculum as a gated cold-start protector to ensure training stability, and goal-agnostic exploration as a regularizer to preserve the diffusion model's inherent multi-modal distribution. Extensive evaluations on a comprehensive simulation benchmark show that SIDP significantly outperforms previous methods, with real-world experiments confirming its effectiveness across multiple robotic platforms. On Jetson Orin Nano, SIDP delivers a 2.5× faster inference than the baseline NavDP, i.e., 110ms VS 273ms, enabling efficient real-time deployment. Project page: https://rhzhang1.github.io/sidp.github.io/.

RA-L 2026-06-01

Stable Tracking-in-the-Loop Control of Cable-Driven RCM Surgical Manipulators Under Erroneous Kinematic Chains

Neelay Joglekar, Fei Liu, Florian Richter, Michael C. Yip

操作与机械臂医疗 / 软体 / 微纳控制与动力学
摘要

Remote Center of Motion (RCM) robotic manipulators have revolutionized Minimally Invasive Surgery, enabling precise, dexterous surgical manipulation within the patient's body cavity without disturbing the insertion point on the patient. Accurate RCM tool control is vital for incorporating autonomous subtasks like suturing, blood suction, and tumor resection into robotic surgical procedures, reducing surgeon fatigue and improving patient outcomes. However, these cable-driven systems are subject to significant joint reading errors, corrupting the kinematics computation necessary to perform control. Although visual tracking with endoscopic cameras can correct errors on in-view joints, errors in the kinematic chain prior to the insertion point are irreparable because they remain out of view. No prior work has characterized the stability of control under these conditions. We fill this gap by designing a provably stable tracking-in-the-loop controller for the out-of-view portion of the RCM manipulator kinematic chain. We additionally incorporate this controller into a bilevel control scheme for the full kinematic chain. We rigorously benchmark our method in simulated and real world settings to verify our theoretical findings. Our work provides key insights into the next steps required for the transition from teleoperated to autonomous surgery.

RA-L 2026-06-01

Real-to-Sim for Highly Cluttered Environments via Physics-Consistent Inter-Object Reasoning

Tianyi Xiang, Jiahang Cao, Sikai Guo, Guoyang Zhao, Andrew F. Luo, Jun Ma

操作与机械臂感知与传感控制与动力学
摘要

Reconstructing physically valid 3D scenes from single-view observations is a prerequisite for bridging the gap between visual perception and robotic control. However, in scenarios requiring precise contact reasoning, such as robotic manipulation in highly cluttered environments, geometric fidelity alone is insufficient. Standard perception pipelines often neglect physical constraints, resulting in invalid states, e.g., floating objects or severe inter-penetration, rendering downstream simulation unreliable. To address these limitations, we propose a novel physics-constrained Real-to-Sim pipeline that reconstructs physically consistent 3D scenes from single-view RGB-D data. Central to our approach is a differentiable optimization pipeline that explicitly models spatial dependencies via a contact graph, jointly refining object poses and physical properties through differentiable rigid-body simulation. Extensive evaluations in both simulation and real-world settings demonstrate that our reconstructed scenes achieve high physical fidelity and faithfully replicate real-world contact dynamics, enabling stable and reliable contact-rich manipulation. Our project page is at:https://physics-constrained-real2sim.github.io/.

RA-L 2026-06-01

SA4Depth: Consistent Pose-Depth Scale Alignment for Self-Supervised Monocular Depth Estimation

Changxuan Li, Nadine Berner, Nassir Navab, Federico Tombari, Stefano Gasperini

导航 / SLAM / 自动驾驶机器人学习感知与传感
摘要

Self-supervised depth estimation from monocular sequences relies on the joint learning of a depth and a pose network. Despite abundant research done to improve the depth network, efforts on the pose remain limited. In this context, even when depth is estimated up to scale, we highlight the importance of the alignment between the scene scales estimated by the pose and depth nets. Then, we introduce SA4Depth, an approach to improve this alignment and boost the depth predictions while keeping the inference time unchanged. Our proposed method uses the depth estimated during training to reproject learnable visual features across consecutive frames and refine the pose estimates by reducing feature alignment residuals. With our method, the estimated scene scales by the separate depth and pose networks are aligned, and the prediction scale consistency is improved across different sequences. Our differentiable refinement integrates seamlessly into existing self-supervised pipelines and substantially improves their depth estimates. We demonstrate this with extensive experiments both outdoors and indoors on KITTI, Cityscapes, and NYUv2. Additionally, results on KITTI Odometry confirm the effectiveness of our pose refinement.

IJRR 2026-05-22

Faster algorithms for growing collision-free convex polytopes in robot configuration space

Peter Werner, Thomas Cohn, Rebecca H. Jiang, Tim Seyde, Max Simchowitz, Russ Tedrake, et al.

导航 / SLAM / 自动驾驶
摘要

We propose two novel algorithms for constructing convex probabilistically collision-free polytopes in robot configuration space. Finding these polytopes enables the application of stronger motion-planning frameworks such as trajectory optimization with Graphs of Convex Sets ( Marcucci et al., 2023 ) and is currently a major roadblock in the adoption of these approaches. In this paper, we build upon the IRIS-NP algorithm (Iterative Regional Inflation by Semidefinite & Nonlinear Programming) of Petersen and Tedrake (2023) to significantly improve tunability, runtimes, and scaling to complex environments. IRIS-NP uses nonlinear programming paired with uniform random initialization to find configurations on the boundary of the free configuration space. Our key insight is that finding nearby configuration-space obstacles using sampling is inexpensive and greatly accelerates region generation. We propose two algorithms using such samples to either employ nonlinear programming more efficiently (IRIS-NP2) or circumvent it altogether using a massively parallel zero-order optimization strategy (IRIS-ZO). Both algorithms employ a novel termination condition that controls the probability of exceeding a user-specified permissible fraction-in-collision, eliminating a significant source of tuning difficulty in IRIS-NP. We further present an approach for applying both algorithms in parametrized configuration spaces. We compare the performance across eight robot environments, showing that IRIS-ZO achieves an order-of-magnitude speed advantage over IRIS-NP, which is extended roughly by an additional order of magnitude by parallelizing it with a GPU. IRIS-NP2, also significantly faster than IRIS-NP, builds larger polytopes using fewer hyperplanes which has the additional benefit of accelerating downstream motion planning. Website: https://sites.google.com/view/fastiris .

Sci. Robotics 2026-04-22

Object-centric task representation and transfer using diffused orientation fields

Cem Bilaloglu, Tobias Löw, Sylvain Calinon

操作与机械臂感知与传感人机交互 / 遥操作
摘要

Curved objects pose a fundamental challenge for task transfer in robotics: Unlike planar surfaces, curved surfaces do not admit a global reference frame. As a result, task-relevant directions such as “toward” or “along” the surface vary with position and geometry, making object-centric tasks difficult to transfer across shapes. To address this, we introduce an approach using diffused orientation fields, a smooth representation of local reference frames, for expressing and transferring tasks across curved objects. By expressing manipulation tasks in these smoothly varying local frames, we reduce the problem of transferring tasks across curved objects to establishing sparse keypoint correspondences. Our representation is computed online from raw point cloud data using diffusion processes governed by partial differential equations, conditioned on keypoints. We evaluate our method under geometric, topological, and keypoint perturbations and demonstrate successful transfer of tasks requiring continuous physical interaction such as coverage, slicing, and peeling across varied objects.

RA-L 2026-06-12

One to Rule Them All: Reinforcement Learning Policy Selection with Closed-Loop Sensitivity

Andrea Pupa, Alberto Dionigi, Paolo Robuffo Giordano, Cristian Secchi, Gabriele Costante

机器人学习
摘要

In reinforcement learning-based control, reliable zero-shot transfer from simulation to real hardware remains a major open challenge. The offline training phase typically produces multiple models, each corresponding to a different policy, and choosing which model to deploy on physical hardware is a critical decision. However, there is no established methodology for this selection process. A common practice is to deploy the policy that achieves the highest reward in simulation, without evaluating its robustness to uncertainties. As a result, due to the sim-to-real gap, the selected policy may perform poorly when transferred to a physical robotic system, potentially leading to unsafe or undesirable behavior. This work proposes to exploit the notion of “closed-loop sensitivity” for applying it, for the first time, to the analysis and comparison of different reinforcement learning models, i.e., different policies, and to estimate their expected performance on real hardware. The proposed metric captures how the closed-loop system reacts to modeling errors, providing a quantitative measure of policy robustness. Both simulation results and real world experimental campaigns demonstrate that the proposed approach enables reliable zero-shot transfer to a real robot, while effectively handling uncertainties and reducing the risk of unsafe behaviors.

RA-L 2026-06-12

Illumination-Adaptive Ordered Checkerboard Corner Detection for Underwater-Acquired Calibration Images

Guodong Wang, Yunxiu Zhang, Xuejiao Yang, Donghao Liu, Qifeng Zhang

感知与传感
摘要

Reliable underwater calibration pipelines require a robust front-end that can localize checkerboard corners at sub-pixel accuracy and assign consistent row–column indices across views. Underwater-acquired calibration images often suffer from spatially varying illumination, motion blur, specular highlights, dome-port distortion, and large viewpoint changes, which weaken corner evidence and reduce the fraction of calibration-usable views. We present an illumination-adaptive framework for ordered checkerboard corner detection in underwater-acquired calibration images. The method combines illumination-adaptive feature enhancement, structured prediction of corner confidence, sub-pixel offsets, and row/column indices, and Hungarian assignment for topology-consistent decoding. On an independent synthetic degradation benchmark, the proposed method achieves 75.4% view-aware correct corner rate (VCCR) at a 0.5-pixel threshold on the composite degradation setting, outperforming the best baseline with Multi-Scale Retinex with Color Restoration preprocessing by 9.9 percentage points. On real underwater data, under a shared OpenCV calibration backend, it achieves the lowest reprojection error among compared ordered-output methods and remains best on a fixed common-view subset. In downstream stereo reconstruction on a second real-world underwater stereo dataset, our calibration increases the common valid stereo area ratio from 67.3% to 88.8% and reduces checkerboard plane/edge root mean square error (RMSE) from 9.35/3.04 mm to 6.02/1.46 mm.

RA-L 2026-06-12

Asking for information by evaluating the communication and coordination trade-off in multi-agent POMDPs

Enza I. Trombetta, Elisa Capello, Dylan A. Shell, Federico Rossi

多机器人 / 集群
摘要

This letter concerns decentralized multi-agent systems in which individual agents operate under partial observability and uncertainty, but with the option of invoking inter-agent communication. When information exchange is infeasible or prohibitively expensive —owing to bandwidth limits, energy constraints, or the demands of clandestine operation— one is prevented from continually maintaining a joint belief over the underlying system state. In such cases, the decision of when to request information from others is key. In revisiting the prior work addressing this decision head-on, we see conservative strategies: agents are constrained to act solely on information that is accessible to all, with useful local observations being ignored whenever communication is too costly. Such an approach, though apposite to high-risk settings when consistent decision-making must be guaranteed, can degrade overall performance significantly. We explore how agents may integrate their local information, even when it has not been globally shared, and then communicate only when prudent—we term this anask-typeparadigm. This re-formulation, in which the agents are not compelled to synchronize, leads to a novel method that enables the decentralized execution of a centralized policy by reasoning, at execution time, about whether the cost of exchanging information will be offset by the information's value. We empirically evaluate our approach across several benchmarks and a case study, assessing in each case its effectiveness in trading between tolerating some loss of coordination and reducing communication costs.

RA-L 2026-06-12

Enhancing ROS Debugging: A User-Centric Diagnostic Framework

Kavindie Katuwandeniya, Samith Rajapaksha Jayasekara Widhanapathirana, Dana Kulić, Leimin Tian

感知与传感
摘要

The Robot Operating System (ROS) is an open-source software framework often utilised in robotics, providing standardised libraries for building complex robotic systems. However, ROS's distributed architecture and messaging system create barriers for understanding robot status and diagnosing system anomalies, particularly for developers with limited ROS expertise. We proposeROS Help Desk(RosHD), an agentic AI system for autonomous anomaly detection and debugging of ROS-based robotic systems, with expertise-adaptive assistance. RosHD continuously monitors system logs and multimodal sensor streams (lidar, RGB cameras) to detect anomalies, and provides developers with customised debugging support via specialised sensor analysis tools, diagnostic tools, ROS diagnostic utilities, code analysis capabilities, and a unified chat interface. We present the first empirical evaluation of expertise-adaptive LLM-powered debugging for ROS through a controlled user study with novice through advanced developers ($N=28$). Participants using RosHD achieved$92.85\%$task success compared to$17.86\%$with baseline assistance, reducing debugging time by up to$72\%$across experience levels.

RA-L 2026-06-12

Grasping On Time: a Time-Controlled Dynamic Grasp Planning with Limited Jerk and Online Adaptation for Moving Objects

Adnan Khalid, Emilio Maranci, Carlo Alberto Avizzano, Salvatore D'Avella

操作与机械臂
摘要

Grasping moving objects with minimal response time while limiting jerk is essential for many industrial and service robotics applications, and is considerably more challenging than grasping static targets. Existing approaches, such as sample-based methods, are not well-suited to this task because the stochastic nature of their solutions often produces jerky or oscillatory motions. This work introduces a method for grasping objects in motion on a conveyor belt by formulating a lightweight quadratic programming problem that explicitly controls the robot's arrival time at the target pose while constraining jerk. The method further supports real-time replanning, enabling the system to handle unpredictable events and environmental uncertainties, such as conveyor starts and stops or fluctuating velocities, without prior knowledge of object speed and without requiring synchronization between the robot and the conveyor. We demonstrate the capabilities of the proposed dynamic grasp-planning framework through multiple experiments on a physical robot at very high speeds up to 16m/minute (i.e., 26.66cm/s). A comparison with a state-of-the-art baseline highlights the superior performance of the proposed approach.

RA-L 2026-06-05

Physics-Constrained Real-Time Hand Grasping in VR/AR via Force Closure Stability Judgment

Junjian Lin, Yinhaoyu Jin, Jianjian Wang, Pingfa Feng, Dingwen Yu, Xiangyu Zhang, et al.

操作与机械臂控制与动力学
摘要

Constructing hand-object interactions in digital environments holds significant value for robotic learning of object manipulation. We propose a physics-constrained real-time hand-object interaction method with two key innovations: (1) real-hand-driven articulation control under physical constraints, where the virtual hand follows real-hand motion through articulation joint drives while the physics engine handles collisions and surface sliding, enabling continuous gesture adjustment during contact; (2) force closure theory (FCT) for real-time stability judgment, enabling automatic grasp stability determination from contact points without prior object knowledge or training data. We implement and validate our approach on VR and AR headsets with visual-only hand tracking, demonstrating realistic hand-object interactions across various rigid object geometries in digital environments.

RA-L 2026-06-04

Speech-Driven Gesture Generation via Conditional Flow Matching With Masked Training and Clamped Sampling

Bowen Wu, Carlos Toshinori Ishi, Takashi Minato, Hiroshi Ishiguro

人形机器人机器人学习
摘要

The ability to perform co-speech gestures is essential for robots with human-like appearances. In this paper, we present a novel conditional flow matching-based generative model to generate co-speech gestures from speech input. Moreover, we propose masked training and the corresponding masked generation to ensure smooth transitions in sequential gesture movements when processing segmented, long speech inputs. Furthermore, masked generation inherently supports general purpose partial gesture completion. While objective evaluation and user study confirm that our method outperforms existing baselines in gesture naturalness, it shows weak speech appropriateness. In addition, to enable direct deployment on humanoid robots, we introduce clamped sampling which enforces physical limits during generation, thereby avoiding hard post-processing (e.g., clamping) that may distort the intended gestures. We conducted a proof-of-concept experiment to demonstrate the efficacy of clamped sampling.

RA-L 2026-06-04

Track-Centric Iterative Learning for Global Trajectory Optimization in Autonomous Racing

Youngim Nam, Jungbin Kim, Kyungtae Kang, Cheolhyeon Kwon

导航 / SLAM / 自动驾驶控制与动力学
摘要

This paper presents a global trajectory optimization framework for minimizing lap time in autonomous racing under uncertain vehicle dynamics. Optimizing the trajectory over the full racing horizon is computationally expensive, and following such a trajectory in the real world hardly assures global optimality due to uncertain dynamics. Yet, existing work mostly focuses on dynamics learning at the control level, without updating the trajectory itself to account for the learned dynamics. To address these challenges, we propose a track-centric approach that directly learns and optimizes the full-horizon trajectory. We first represent trajectories through a track-agnostic parametric space in light of the wavelet transform. This space is then efficiently explored using Bayesian optimization, where the lap time of each candidate is evaluated by running simulations with the learned dynamics. This optimization is embedded in an iterative learning framework, where the optimized trajectory is deployed to collect real-world data for updating the dynamics, progressively refining the trajectory over the iterations. The effectiveness of the proposed framework is validated through simulations and real-world experiments, demonstrating lap time improvement of up to 8.9% over a nominal baseline and consistently outperforming state-of-the-art methods.

RA-L 2026-06-04

Corridor-Driven Topological Planning With Nonlinear MPC for Agile Quadrotor Flight

Honghao Pan, Hang Wang, Farshad Arvin, Junyan Hu

无人机 / 空中机器人控制与动力学
摘要

Trajectory planning and control for quadrotors remain challenging when aiming to achieve agile and safe flight in cluttered environments. In this paper, we present a corridor-driven topological planning and control framework for agile quadrotor flight. To achieve this goal, a topological search generates multiple homotopy distinct reference paths, and a time optimal problem allocates execution time to provide high quality initialisation. Then, we propose a piecewise parametric safe corridor, consisting of off-centred elliptical cross sections whose parameters vary smoothly with the path. Additionally, we develop an efficient reformulation for real-time computation while preserving continuity. Based on this corridor representation, a nonlinear model predictive controller tracks the optimal reference trajectory and enforces corridor and dynamic constraints. Both simulations and real-world experiments demonstrate that the proposed planner achieves agile flight under corridor constraints and real-time performance in complex environments.

RA-L 2026-05-29

VBT-MPC: Vision-Based Tactile MPC for Contour Following

Edison Velasco-Sanchez, Luis F. Recalde, Guanrui Li, Pablo Gil

操作与机械臂感知与传感控制与动力学
摘要

Tactile sensing plays a key role in robotic manipulation, particularly in tasks like surface inspection. Successful execution requires maintaining contact while accurately tracking object contours. In this work, we propose a Vision-Based Tactile Model Predictive Control (VBT-MPC) framework for robotic contour following using a Vision-Based Tactile Sensor (VBTS) mounted in an eye-in-hand configuration. The proposed controller operates directly in contour features space, thereby avoiding the need for separate pose-estimation modules or complex force-control architectures. We further compare our VBT-MPC with visual-servoing strategies adapted to tactile features, and evaluate contour tracking on objects with diverse geometries and materials in both simulation and real-world experiments.

RA-L 2026-05-29

Wearable Guiding System With Traction-Kinesthetic Feedback for the Visually Impaired

Chunhao Peng, Dapeng Yang, Deyu Zhao, Li Jiang, Hong Liu

导航 / SLAM / 自动驾驶感知与传感医疗 / 软体 / 微纳
摘要

Existing electronic travel aids remain limited in providing stable and comprehensive daily guidance support. Inspired by shoulder-guiding in human assistance, this paper develops and evaluates a wearable guiding system based on traction-kinesthetic feedback. The system integrates RGB-D perception, path planning, traction cues, and voice interaction, enabling navigation, obstacle avoidance, and object localization in structured indoor environments. Twelve subjects (8 sighted, 4 blind) completed three comparative experiments and a multi stage comprehensive experiment. Compared with the standard cane, traction-kinesthetic feedback reduced task completion time (sighted: 23.8–54.5%; blind:30.7–48.1%) and walking distance (sighted: 7.3–55.8%; blind: 16.5–43.6%), while improving walking speed (sighted: 7.6–26.3%; blind: 9.4–25%) and yielding smoother and more stable walking trajectories. Compared with auditory feedback, traction-kinesthetic feedback was associated with more prompt turns, suggesting improved path continuity and movement responsiveness. Questionnaire results indicated high perceived intuitiveness and quietness (91.67% of subjects), suggesting lower auditory/cognitive load. Subjects reported wearing comfort (75% comfortable, 25% moderate), excellent interaction (83.33%), and stable operation (91.67%). Moreover, the multi-stage comprehensive experiment demonstrated the effective integration of traction guidance and voice assistance, indicating practical usability and application potential.

RA-L 2026-05-29

Design and Control of a Multimodal Rotor-Assisted Pendulum-Driven Spherical Robot

Dongshuai Huo, Hanxu Sun, Xiaojuan Lan, Minggang Li

足式 / 四足机器人多机器人 / 集群控制与动力学
摘要

To address the inherent limitations of conventional pendulum-driven spherical robots (PDSRs), notably the chase-overshoot limit cycle oscillation (CLCO), slow dynamic response, and restricted terrain adaptability, this paper presents a rotor-assisted PDSR design, designated the Rotor-Assisted platform (RA). The proposed system augments a traditional two-degree-of-freedom (2-DOF) pendulum drive with a dual-rotor mechanism featuring passive opening/closing, tilting, and bidirectional rotation, enabling three locomotion modes (Mode 1–Mode 3). A rotor-pendulum cooperative control framework is developed, employing state-dependent torque allocation and an adaptive effectiveness matrix to achieve stable, coordinated actuation across all modes. Experimental results demonstrate that the rotor-assisted approach effectively mitigates this oscillation (with up to 82% reduction in pitch-rate root mean square (RMS)), significantly enhances attitude stability and maneuverability, and enables agile in-situ rotation as well as climbing on slopes up to 32 $^\circ$ from a standstill. These capabilities validate RA as a versatile platform for dynamic, multi-terrain applications.

RA-L 2026-05-29

Residual Rotation Correction Using Tactile Equivariance

Yizhe Zhu, Zhang Ye, Boce Hu, Haibo Zhao, Yu Qi, Dian Wang, et al.

操作与机械臂机器人学习感知与传感
摘要

Visuotactile policy learning augments vision-only policies with tactile input, facilitating touch-reliant manipulation tasks where tactile sensing is essential. However, collecting demonstrations that cover diverse object orientations within the gripper can be expensive, makes sample efficiency the key requirement for developing visuotactile policies. We present EquiTac, a framework that leverages the planar SO(2) rotational symmetry of an object held by a parallel gripper to improve sample efficiency and generalization for visuotactile policy learning. EquiTac reconstructs surface normal maps from RGB tactile images to obtain a rotation-consistent representation. An $\rm{SO}(2)$ -equivariant network predicts a yaw residual from a single tactile frame, which is composed with the output of a pretrained visuomotor policy to correct orientation errors at test time. On a real robot, EquiTac accurately achieves robust generalization to unseen object orientations within grippers using very few training samples, where baselines fail even with more training data. By explicitly encoding $\rm{SO}(2)$ tactile equivariance, EquiTac provides a lightweight residual correction mechanism for orientation-sensitive parallel-gripper manipulation.

AuRo 2026-06-19

STEM: Semantic target search and exploration using MAVs in cluttered environments

Nikhil Sethi, Max Lodel, Laura Ferranti, Robert Babuška, Javier Alonso-Mora

无人机 / 空中机器人导航 / SLAM / 自动驾驶感知与传感
摘要

Autonomous target search is crucial for deploying Micro Aerial Vehicles (MAVs) in emergency response and rescue missions. Existing approaches either focus on 2D semantic navigation in structured environments – which is less effective in complex 3D settings, or on robotic exploration in cluttered spaces – which often lacks the semantic reasoning needed for efficient target search. This paper overcomes these limitations by proposing a novel framework that utilizes a semantically-guided viewpoint planner to minimize target search and exploration time in unstructured 3D environments using an MAV. Specifically, we develop a combinatorial planner that generates efficient semantic exploration plans by prioritizing viewpoints that likely lead to the target. To guide the planner towards the target, an active perception pipeline is developed that propagates semantic priorities of observed objects into neighboring frontier voxels for computing semantic information gains of frontier viewpoints. In addition, we demonstrate how LLM-based similarity scores can be leveraged as semantic priority input to our pipeline. Evaluations in two distinct simulation environments show that the proposed method consistently outperforms baselines by quickly finding the target while maintaining reasonable exploration times. Real-world experiments with an MAV further demonstrate the method’s ability to handle practical constraints like limited battery life, small sensor range, and semantic uncertainty.

JFR 2026-06-19

A Review on Search and Rescue Robots in Complex Scenarios: Key Technologies of Simultaneous Localisation and Mapping

Tianyi Chen, Adam Rushworth, Fuhua Jia

导航 / SLAM / 自动驾驶感知与传感控制与动力学
摘要

This paper presents a comprehensive review of robotics research in search and rescue (SAR) operations conducted in caverns, underground environments, disaster zones, and other areas where Global Navigation Satellite System (GNSS) signals are unavailable. The majority of applications for Simultaneous Localisation and Mapping (SLAM), despite its maturity, are still restricted to structured indoor settings or outdoor environments under normal weather conditions. Standard SLAM frameworks often experience degradation and malfunction or even fail when deployed in increasingly complex and unstructured scenarios. This review identifies three major challenges that robots face in SAR environments: (i) increasingly complex terrain, (ii) changing environments and visibility, and (iii) autonomous exploration requirements, along with corresponding technological evolutions in robot mobility, sensor technologies, and SLAM algorithms. A comprehensive and quantitative evaluation of existing approaches is provided, focusing on SLAM on uneven terrain, multisensor fusion, and active SLAM. Additionally, this paper outlines ongoing challenges for guiding future development toward more robust and reliable deployment‐oriented SLAM solutions for SAR applications. These include: (i) short‐term dynamics and structural changes that undermine data association and loop closure, (ii) observability loss and degeneracy in confined and cluttered spaces, and (iii) multirobot consistency under constrained communication. Two cross‐cutting constraints, which are sensor non‐stationarity and safety‐critical autonomy, are highlighted as key factors that turn deployable SAR SLAM into a system‐level reliability problem. Finally, potential research directions and a practical research roadmap toward robust, real‐time, and evaluable SAR SLAM are outlined.

RA-L 2026-06-15

Learning 3D Scene Reconstruction From Nighttime Driving Videos

Andrea Ramazzina, Stefano Gasperini, Mario Bijelic, Felix Heide, Federico Tombari

摘要

Neural Radiance Fields and Gaussian Splatting have emerged as powerful representations for reconstructing complex dynamic urban scenes from multi-view driving logs. By producing photorealistic and geometrically consistent renderings, these methods offer a foundation for closed-loop simulation as well as scalable data augmentation engine, enabling the synthesis of diverse viewpoints and conditions without the need for costly additional data collection in the real world. However, existing approaches are almost exclusively tailored to well-lit, daytime environments and struggle with the challenges of nighttime settings, where noise, strong flares, and moving light sources dominate the visual signal. To address these limitations, we propose NightSplat, a Gaussian Splatting framework specifically designed for high-fidelity reconstruction and novel view synthesis of nighttime driving scenes. Through dedicated components, our method explicitly models sensor noise, lens flare effects, and introduces a lightweight module to represent dynamic vehicle lights. Extensive evaluations on nuScenes and Waymo Open Dataset demonstrate that NightSplat significantly outperforms prior state-of-the-art baselines, both quantitatively and qualitatively, thereby extending simulation and testing to nighttime scenes. Project page: night-splat.github.io

RA-L 2026-06-08

H-VSJ: A Hybrid Stiff-Compliant Variable Stiffness Joint With Decoupling Position and Stiffness Modulation

Lei Yan, Junpeng Gu, Ziye Zheng, Sheng Liu, Shilong Sun, Fujun Peng, et al.

操作与机械臂
摘要

The wide spectrum of robotic applications motivates the intrinsic stiffness modulation of the robot besides compliance control, especially for highly dynamic manipulation. Variable Stiffness Actuators (VSAs) are designed for improving impact resistance, stiffness adaptability, and dynamic performance, while many existing variable stiffness mechanisms still face coupling between position and stiffness modulation. This paper presents a Hybrid Stiff-Compliant Variable Stiffness Joint (H-VSJ) that employs a decoupling mechanism to achieve independent control of joint position and stiffness as well as hybrid stiff-compliant modes. By adjusting the force-transmission ratio between the elastic element and the output link along the joint axis, H-VSJ achieves the decoupled and accurate stiffness modulation. The decoupling mechanism also allows the joint to smoothly switch between the stiff mode and compliant mode. The stiffness model of H-VSJ is established, and the influence of key design parameters on stiffness is further analyzed. Finally, the effectiveness of the proposed H-VSJ is validated through experiments on buffering motion range, stiffness-position decoupling, stiffness analysis, dynamic stiffness modulation, and impact-buffering capability, which demonstrate H-VSJ has a maximum deflection of$30.20^\circ$in the low-stiffness state, an average stiffness modulation error of 7.00%, a stiffness range of 23.57–$260.36~\mathrm{N\ m/rad}$in the compliant mode, and$8630.25~\mathrm{N\ m/rad}$in the stiff mode.

RA-L 2026-06-08

GPU-Accelerated Simulation of Densely Connected Tensegrity Networks for Statistical Analysis of Morphological Convergence

Yanqiu Zheng, Nobuyuki Masuda, Zebing Mao, Cong Yan, Chengyan Zhao, Longchuan Li

控制与动力学
摘要

Densely connected tensegrity networks provide a useful testbed for morphology-driven dynamics, but their quadratic cable interactions make large-scale, long-horizon simulation prohibitively expensive for CPU-based implementations, limiting quantitative studies of morphological properties. This study presents a size-scalable dynamic model and a GPU-accelerated simulator for two-dimensional multi-rod tensegrity systems under dense connectivity, while preserving the same tension-only cable mechanics and fixed-step time integration scheme as a CPU baseline. Leveraging the resulting throughput, we conduct Monte Carlo sweeps over system size and randomized initial conditions, and quantify morphological convergence to a conservative feasibility set using simple, interpretable scalar metrics. Across the tested range, larger networks exhibit reduced variability in morphology statistics, higher convergence rates, and typically shorter convergence times. These results provide quantitative evidence that strong internal coupling supports statistically robust morphological convergence, and show that GPU acceleration enables reproducible, large-scale evaluation of morphology-driven dynamics in densely connected tensegrity networks.

RA-L 2026-06-08

SynergyGrasp: A Structure-Aware Synergy Framework for Multi-Hand Grasp Generation

Wanhao Niu, Hao Sun, Yongfeng Rong, Zifan Zhu, Yuanyan Xie, Huaidong Zhou, et al.

操作与机械臂
摘要

Generating dexterous grasps for high degree-of-freedom hands remains challenging due to the need for coordinated finger motion and physical consistency. Conventional frameworks often overlook the hand's intrinsic kinematic topology, leading to self-collisions or unnatural poses. SynergyGrasp is introduced as a structure-aware pipeline for multi-hand dexterous grasp generation. The framework first generates physically consistent grasp postures on primitive objects through gravity–magnet closure, then learns Graph-Laplacian Principal Component Analysis (GL–PCA)-based synergy variables from hand kinematic topology and observed joint correlations, and finally refines wrist pose and synergy coordinates under force-closure and collision objectives. Across multiple robotic hands, this pipeline supports stable grasp synthesis and the construction of a unified grasp dataset. Experiments on six robotic hands demonstrate that GL–PCA generates smoother subspaces compared to standard PCA. SynergyGrasp achieves a 59.48% grasp success rate with high physical validity, and hardware tests confirm the generation of robust, feasible grasps across diverse morphologies.

RA-L 2026-06-08

FlowPed: Proprioceptive State Fusion and Efficient Diffusion for Pedestrian Trajectory Prediction in Manufacturing Plants

Shaoze Yang, Shreyas Bhat, Doo Won Han, Al Salour, Terra Stroup, Rumzi Barakat, et al.

机器人学习
摘要

Autonomous Guided Vehicles (AGVs) must anticipate human movement to operate safely in manufacturing plants. Unlike outdoor crowd settings, shop floors offer rich, reliable environmental priors (CAD/HD maps, stations, sidewalks, flow rules), but little shareable trajectory data and tight latency budgets. We presentFlowPed, a trajectory forecasting framework tailored to this regime. FlowPed (i) fuses training-time privileged plant signals, including finite-automaton (FAM) motion states and geometry-aware features to stabilize learning under data scarcity; (ii) abstracts behavior via a VQ-VAE that discovers discrete latent “actions” and augments a gated residual + Transformer encoder to form a compact condition vector; and (iii) generates futures with an Elucidated Diffusion Model (EDM) implemented as a Diffusion Transformer (DiT) with few-step Karras sampling for real-time inference. We pretrained on public pedestrian repositories using only generic kinematics and fine-tuned on a VR shop-floor dataset (30 participants; 10 hours) instrumented with worker pose/gaze and AGV state. In ablations, FlowPed achieved 2.5× faster inference than a 5-step flow-matching baseline on AGV-class hardware (0.022 s vs. 0.056 s per forecast on an RTX A2000) while maintaining or improving accuracy (e.g., 0.6%$L\_{\text{2 mean}}$, 25.6%$L\_{\text{2 max}}$at 4s). Pre-training yielded consistent gains across decoder variants (4.2–6.4%$L\_{\text{2 mean}}$, 3.7–5.0%$L\_{\text{2 max}}$). The results indicate that the combination of plant priors, discrete latent actions, and few-step diffusion enables accurate, low-latency pedestrian prediction suitable for embedded AGV stacks in environment-rich, but data-scarce factories.

RA-L 2026-06-08

Target-Attacker-Defender Differential Game With Pursuit-Constrained Attacker

Ningchen Sun, Qian Ma

多机器人 / 集群
摘要

This paper presents a novel target-attacker-defender differential game with a pursuit-constrained attacker modeled as autonomous robots. The attacker enjoys a speed advantage but is restricted to a predefined operational region. Considering the effect of the attacker's capture region on game outcomes, two winning scenarios for the attacker and three for the target defender team are analyzed. Under the games-of-kind frame work, the winning conditions for three players are established for each scenario. With practical payoff functions, the optimal strategies of all robots are derived. Simulation results demonstrate the validity of the proposed approach, which can be applied to multi-agent confrontation, collaborative defense and autonomous decision-making systems.

RA-L 2026-06-08

SpinJORR : Spin actuated Jumping Omnidirectional Rolling Robot

Venkata Ravindhra Reddy Varikuti, ArunSai Kammari, Ali Mohseni, Phanindra Tallapragada

足式 / 四足机器人
摘要

This paper presents the design of a compact, spin actuated rolling and jumping robot. The robot can use the rolling modality for efficient motion and jumping over obstacles. The robot consists of a fully 3D-printed circular hoop enclosing an internal pendulum-like unbalanced rotor and a tip-mounted rotor that together enable rolling, steering, and jumping through controlled actuation. The design is based on a dynamic model that exploits unconventional mechanics to achieve the versatile capabilities. Unlike traditional jumping robots that rely on elastic components to store and release energy, the proposed design employs a purely momentum-based actuation mechanism, reducing relaxation times between jumps to almost zero. Experimental results demonstrate stable steering, high forward jumps of more than four body lengths (0.9 m), long forward leaps reaching nine body lengths (1.8 m), and reverse jumps reaching approximately five body lengths (1.1 m). The robot can recover from a fall again solely through the gyroscopic action of the internal rotors, coupled with ground contact forces. Experiments demonstrate its ability to traverse a variety of terrains. The simplicity and compactness of the design, as well as its versatility and agility, make the system suitable for applications requiring locomotion over unstructured terrains.

RA-L 2026-06-08

Design and Analysis of a Continuum Robot With Triple-Layer Intra-Segment Coupling and Inter-Segment Actuation Decoupling

Donghua Shen, Wentao Zhou, Yali Han, Chunlei Tu, Xingsong Wang, Qi Zhang, et al.

医疗 / 软体 / 微纳
摘要

Continuum robots show great potential in unstructured environments due to their superior safety and compliance compared to traditional articulated robots. However, their applications are limited by low load-bearing capacity and inter-segment actuation coupling. To address these issues, we propose a novel continuum robot featuring a triple-layer tendon system for intra-segment coupling and an inter-segment actuation decoupling structure. The coupling segment consists of serially connected spherical and socket joints with internal tendon channels. Triple-layer tendons pass through these channels, interconnecting the joints to enable synchronized motion from adjacent joints to the entire segment. For inter-segment decoupling, redirection pulleys are incorporated within the first two segments. Driving tendons for distal segments are routed through these pulleys and exit from the base segment to external actuators, ensuring independent motion of each segment. Experimental results demonstrate that the intra-segment coupling significantly enhances stiffness and load capacity. Motion experiments confirm effective spherical joint coupling and accurate bending angle tracking of sinusoidal references. The comprehensive experimental results of the three-segment continuum robot show that the segments have good decoupling performance and the robot possesses high repeatability positioning accuracy.

RA-L 2026-06-08

Matrix-Free Delassus Operations: Scalable and Memory-Efficient Algorithms

Ajay Suresha Sathya, Louis Montaut, Yann de Mont-Marin, Justin Carpentier

操作与机械臂
摘要

The Delassus matrix, closely related to the operational-space inertia matrix, is a fundamental quantity in robotics with applications in simulation, system identification, and control. Traditional approaches compute and store this matrix explicitly, either in sparse or dense form. In this work, we depart from this convention by treating the Delassus matrix as a matrix-free operator. We derive efficient algorithms with low computational complexity that multiply the Delassus matrix or its damped inverse by an input vector or matrix. Unlike approaches based on explicit matrix construction, our method achieves a linear memory footprint, making it scalable to problems with thousands of constraints and suitable for execution on resource-limited hardware. We implement these matrix-free operations on top of the open-source Pinocchio library and evaluate their performance against state-of-the-art methods that rely on explicit matrix computation. Our benchmarks demonstrate substantial speedups, ranging from approximately$2\times$to$4\times$for single-robot scenarios, up to$10\times$when joint friction is enabled, and over$400\times$in the Pyramid+Robots contact-rich multi-body scene.

RA-L 2026-06-08

Elephant-Trunk-Inspired Power Soft Robotic Arm

Hiroto Kodama, Ryota Iwahara, Tohru Ide, Yunhao Feng, Hiroyuki Nabae, Gen Endo, et al.

医疗 / 软体 / 微纳
摘要

Shape adaptability to conform to surrounding objects in various settings is required for robotic arms performing earthwork in complex and rapidly changing environments such as disaster sites. We therefore focused on soft robotic arms, which possess this characteristic. However, soft robotic arms are mainly focused on flexible motion, and their load capacities are insufficient for practical use in earthwork. Therefore, this paper proposes an elephant-trunk-inspired power soft robotic arm, designed for high load capacity, inspired by the musculature of an elephant's trunk. The arm mainly comprises Pneumatic Power Bags (PPBs), which expand cylindrically with pneumatic pressure, and Hydraulic Artificial Muscles (HAMs). We hypothesized that high load capacity is achieved as PPBs support compressive stress and HAMs support tensile stress under load. We prototyped the arm and modeling and experimental studies demonstrate that the arm's stiffness can be tuned by varying pneumatic and hydraulic pressures. Additionally, the arm successfully lifted a 294 N weight and it was shown that the arm has capacity to passively deform along an object simply by applying pressure, without active control.

RA-L 2026-04-27 · 被引 1

FAR-AVIO: Fast and Robust Schur-Complement Based Acoustic-Visual-Inertial Fusion Odometry With Sensor Calibration

Hao Wei, Peiji Wang, Qianhao Wang, Yang Gu, Tong Qin, Fei Gao, et al.

导航 / SLAM / 自动驾驶
摘要

Underwater environments impose severe challenges to visual-inertial odometry systems, as strong light attenuation, marine snow and turbidity, together with weakly exciting motions, degrade inertial observability and cause frequent tracking failures over long-term operation. While tightly coupled acoustic-visual-inertial fusion, typically implemented through an acoustic Doppler Velocity Log (DVL) integrated with visual-inertial measurements, can provide accurate state estimation, the associated graph-based optimization is often computationally prohibitive for real-time deployment on resource-constrained platforms. Here we present FAR-AVIO, a Schur-Complement based, tightly coupled acoustic-visual-inertial odometry framework tailored for underwater robots. FAR-AVIO embeds a Schur complement formulation into an Extended Kalman Filter(EKF), enabling joint pose-landmark optimization for accuracy while maintaining constant-time updates by efficiently marginalizing landmark states. On top of this backbone, we introduce Adaptive Weight Adjustment and Reliability Evaluation(AWARE), an online sensor health module that continuously assesses the reliability of visual, inertial and DVL measurements and adaptively regulates their sigma weights, and we develop an efficient online calibration scheme that jointly estimates DVL-IMU extrinsics, without dedicated calibration manoeuvres. Numerical simulations and real-world underwater experiments consistently show that FAR-AVIO outperforms state-of-the-art underwater SLAM baselines in both localization accuracy and computational efficiency, enabling robust operation on low-power embedded platforms. Our implementation has been released as open source software.

RA-L 2026-06-01

SurfSurg6D: Geometry Consistent Dense Correspondence for Textureless Surgical Instrument Pose Estimation

Daiyun Shen, Shuojue Yang, Chang Han Low, Qian Li, Mengya Xu, Qi Dou, et al.

感知与传感医疗 / 软体 / 微纳
摘要

Surgical instrument pose estimation provides crucial information for promising applications, including autonomous robotic surgery, skill assessment, and standardization of surgical workflow. However, this task remains highly challenging due to high precision requirements, frequent occlusions, textureless instruments, scarcity of depth information and very limited annotated data. These constraints often lead to unsatisfactory performance when employing general object pose estimation approaches to surgical scenarios. To address these issues, we first construct a new dataset SynSurg6D, to alleviate the data shortage in this task. We further propose SurfSurg6D, a dense correspondence framework tailored for surgical instrument pose estimation. Experimental results on the SurgRIPE, EndoVis2018 and SurgPose datasets demonstrate that the introduction of our generated dataset SynSurg6D is able to diversify the pose distributions, thus enhancing the performance of existing approaches. Furthermore, SurfSurg6D outperforms existing methods, pro viding a robust solution for precise and efficient RGB-only pose estimation. The code and dataset will be open-sourced at https://github.com/StackingDataYeti/SurfSurg6D.

RA-L 2026-06-01

Design of a Cable-Driven, Self-Aligning Knee Exoskeleton to Eliminate Joints’ Misalignment Across Users With Varying Leg Dimensions

Xinhua Yang, Sheng Guo, Lianzheng Niu, Duxin Liu, Majun Song, Peiyi Wang, et al.

医疗 / 软体 / 微纳人机交互 / 遥操作
摘要

One of the primary difficulties in exoskeleton use is the misalignment between the device's mechanical joints and the user's natural biological joints. Knee exoskeletons, widely studied for assisting walking and stand-to-sit transitions, typically require precise adjustment of the rigid frame to match the user's knee joint axis. If not properly aligned, parasitic forces arise at the physical human–robot interface (pHRI), causing significant discomfort. To address this problem, we propose a cable-driven, self-aligning knee exoskeleton that accommodates different wearing positions without the need for accurate pre-use alignment. Platform experiments showed that the exoskeleton adapted well to variations in wearing position in the deactivated mode and transmitted assistive torque with limited changes in parasitic forces when activated. At an assistive torque of 18 Nm, the relative error between the measured and target values was 7.27%. Human-subject experiments further demonstrated substantial reductions in relative sliding distance (RSD) and parasitic forces during activities such as walking and stand-to-sit transitions: the maximum RSD was reduced by 84.9% compared with the Orthosis condition and by 85.1% compared with the Locked condition; the maximum parasitic force was reduced by 89.1% and 89.0%, respectively, and the maximum parasitic torque was reduced by 95.6% and 96.1%, respectively.

RA-L 2026-06-01

G-DRAGON: Geospatial Reasoning and Dynamic Planning for Retrieval-Augmented Outdoor Navigation

Dongzhihan Wang, Yi Du, Jianan Sun, Yuan Xue, Yingchen Zhang, Bing Xiao, et al.

导航 / SLAM / 自动驾驶机器人学习
摘要

Autonomous ground robots operating in large-scale outdoor environments require both robust long-range navigation and fine-grained “last-mile” exploration. Current advances in visual-language navigation (VLN) work well at short-range tasks, lacking geospatial grounding for long-distance missions. Some OpenStreetMap (OSM)-based methods relying on cloud-based Large Language Models (LLMs) are prone to factual hallucination and cannot conduct “last-mile” exploration based on human instruction. To address these challenges, we present G-DRAGON, a retrieval-augmented framework for outdoor, open-world navigation. This framework maps natural-language commands to versioned, local OSM entities via generative retrieval based on lightweight LLM, yielding accurate coordinates for global route planning. A high-level planning module bridges global topological routes with the SLAM system, projecting geospatial waypoints into the robot's navigable frame. For the “last mile,” the framework transitions to frontier-based exploration and open-set semantic voxel mapping to localize open-vocabulary targets. Experimental results in simulation demonstrate our framework outperforms state-of-the-art baselines. Furthermore, we validate the system in unseen real-world urban environments on an Unmanned Ground Vehicle (UGV), successfully completing person-search missions with trajectories of up to 500m.

RA-L 2026-06-01

SOCC-ICP: Semantics-Assisted Odometry Based on Occupancy Grids and ICP

Johannes Scherer, Sebastian Hirt, Henri Meeß

导航 / SLAM / 自动驾驶感知与传感
摘要

Reliable pose estimation in previously unseen environments is a fundamental capability of autonomous systems. Existing LiDAR odometry methods typically employ point-, surfel-, or NDT-based map representations, which are distinct from the semantic occupancy grids commonly used for downstream tasks such as motion planning. We introduce SOCC-ICP, a semanticsassisted odometry framework that jointly performs Semantic OCCupancy grid mapping and LiDAR scan alignment. Each map voxel encodes geometric and semantic statistics, enabling adaptive point-to-point or point-to-plane ICP based on local planarity. Further, the occupancy grid naturally filters dynamic objects through raycasting-based free-space updates. Across diverse evaluation scenarios, SOCC-ICP achieves performance competitive with state-of-the-art LiDAR odometry and remains robust in geometrically degenerate environments, even in the absence of semantic cues. When semantic labels are available, integrating them into map construction, downsampling, and correspondence weighting yields further accuracy gains. By unifying odometry and semantic occupancy grid mapping within a single representation, SOCC-ICP eliminates redundant map structures and directly provides a map suitable for downstream robotic applications.

RA-L 2026-06-01

LAPS: Improving Incremental LiDAR Mapping Using Active Pooling and Sampling for Neural Distance Fields

Dongjae Lee, Wooseong Yang, Yifu Tao, Maurice Fallon, Ayoung Kim

导航 / SLAM / 自动驾驶感知与传感
摘要

Neural distance fields offer a compact and continu ous representation of 3D geometry, making them attractive for incremental LiDAR mapping. However, their online optimization is vulnerable to catastrophic forgetting, where new observations can degrade previously reconstructed geometry. Replay-based training is commonly used to address this issue, but existing methods typically rely on passive replay buffers and uniform sampling, which can waste memory on redundant observations and under-train poorly constrained regions. We propose LAPS, a replay management framework for incremental neural mapping that improves both replay retention and replay allocation during online updates. LAPS combines reliability-based active pooling to retain reliable historical samples under limited memory with uncertainty-guided active sampling to focus optimization on under-constrained regions. Experiments on synthetic and real world benchmarks show that LAPS consistently improves recon struction completeness while maintaining competitive geometric accuracy. On Oxford Spires, it improves recall by 4.66 pp and F1-score by 3.79 pp over PIN-SLAM on the Blenheim Palace 05 sequence. We release our open source implementation at: https://github.com/dongjae0107/LAPS.

RA-L 2026-06-01

DMT-CPP: A Delaunay-Graph-Based Framework for Real-Time Multi-Robot Online Target Coverage

Pengyu Wang, Zikai Wang, Ling Shi, Max Q.-H. Meng

导航 / SLAM / 自动驾驶多机器人 / 集群
摘要

In this article, we propose a Delaunay-graph-based Multi-Robot Online Target Coverage Path Planner (DMT-CPP) to enhance multi-robot systems' performance in online target coverage tasks. We present an innovative task allocation mechanism by maintaining a Delaunay graph according to the real-time condition. By applying K-means clustering and pairwise optimization on this graph, this mechanism can achieve computationally efficient and equitable task distribution. Additionally, we propose a novel path planning mechanism that utilizes the Delaunay graph to determine the coverage target and plan the tracking path, guaranteeing the algorithm's real-time performance within dynamic workspaces. Extensive comparative experiments demonstrate DMT-CPP's exceptional coverage efficiency, in terms of reduced coverage completion time, relative to state-of-the-art methods and confirm its effectiveness in real-time applications. Real-world experiments further validate the planner's effectiveness and robustness in practical implementations.

RA-L 2026-06-01

HIVE-6D: Hierarchical Visual-Tactile In-Hand Pose Estimation and Tracking

Jin Liu, Weihao Wang, Hongzhan Yu, Can Zhao, Haonan Zhao, Daolin Ma, et al.

操作与机械臂感知与传感
摘要

Robust in-hand 6D pose estimation is essential for dexterous robotic manipulation. Existing approaches typically rely on vision or tactile sensing, yet both modalities have inherent limitations: vision provides global pose cues but degrades under hand–object occlusion, while tactile remains available during contact but is local and often geometrically ambiguous. Moreover, their reliability varies across manipulation stages, making effective fusion challenging. In this work, we propose HIVE 6D, a hierarchical visual–tactile framework with two-level fusion for in-hand 6D pose estimation and tracking. At the feature level, an asymmetric anchor-and-refine design first establishes a global pose anchor from vision and then performs tactile residual refinement using a render-and-compare strategy with vision guided cross-attention. At the policy level, an uncertainty-aware routing mechanism dynamically selects the trusted modality and refinement anchor over time to stabilize tracking under contact and occlusion. Extensive experiments in simulation and on a real robot show that HIVE-6D improves single-frame pose accuracy, strengthens long-horizon tracking robustness, and increases success rates on a real-world peg insertion task, including under external disturbances.

RA-L 2026-06-01

Feedback-Guided Hierarchical Transmission for Multi-Robot Mapping Under Bandwidth Constraints

Guanchong Niu, Shuwake Nuerdaolieti, Jinglei Li, Zewei Jing, Lanxiang Hou, Jiayi Sun

导航 / SLAM / 自动驾驶多机器人 / 集群
摘要

As multi-robot mapping systems expand into data intensive environments, efficient data transmission methods are essential for maintaining global consistency and safety under scarce and time-varying bandwidth. In this work, we propose a Feedback-Guided Hierarchical Transmission framework for centralized multi-robot mapping. In this framework, the submap representation is structured with three layers: L1 extracts the pose-and-factor skeleton to ensure topological consistency with minimal overhead; L2 streams geometric details, prioritized by semantic-weighted entropy reduction to refine the occupancy map; L3 delivers semantic labels to support high-level under-standing. To optimize transmission, the backend performs centralized occupancy fusion and employs Truncated Least Squares Graduated Non-Convexity (TLS–GNC) pose-graph optimization and a Lambda-Field risk model to produce guidance signals. These signals dynamically reshape the utility curves of the layers to prioritize loop-closure consistency and safety-critical regions. Experiments are conducted on S3DIS and ScanNet indoor RGB D datasets to validate the effectiveness of the proposed frame work. The performance evaluation under bandwidth sweeps and packet loss tests indicates that our method reaches joint geometry-and-semantics accuracy with fewer bytes, recovers loop closures faster, and lowers estimated collision risk in rendezvous regions compared with non-hierarchical baselines.

RA-L 2026-06-01

ESF-3Det: Edge-Semantic Fusion With Uncertainty Awareness for Robust LiDAR-Based 3D Object Detection

Huishan Wang, Jie Ma, Jianlei Zhang, Fangwei Chen

导航 / SLAM / 自动驾驶感知与传感
摘要

LiDAR-based 3D object detection is fundamental for real-world perception, providing accurate geometric information for downstream tasks. However, performance often degrades under challenging conditions such as occlusion, sparse point clouds, and small-scale objects. To address these issues, we propose ESF-3Det, a unified Edge–Semantic Fusion framework that jointly models geometric structure, semantic context, and predictive uncertainty for precise boundary localization. The framework comprises three core components: an Edge Pyramid Network (EPN) that extracts hierarchical boundary features from bird's-eye view representations; an Edge-Guided Attention (EGA) module that aligns semantic features with geometric contours to mitigate semantic drift; and an Uncertainty-Aware Fusion (UAF) mechanism that adaptively integrates multi-level features in ambiguous regions based on semantic entropy and edge cues. By jointly leveraging boundary cues, semantic guidance, and uncertainty modeling, ESF-3Det produces boundary-consistent and uncertainty-aware representations, enhancing detection robustness. Extensive experiments on the KITTI and Waymo datasets demonstrate consistent improvements, including a 6.55% AP gain for heavily occluded pedestrians, validating the effectiveness and generalizability of the proposed framework.

RA-L 2026-06-01

DIRA: Diffusion-Based Imitation-to-Reinforcement Adaptation for Task Automation of Surgical Robots

Guowei Wang, Xinan Sun, Yuan Xing, Rui Cao, Zhikang Ma, Xuan Zhang

机器人学习医疗 / 软体 / 微纳
摘要

The autonomy of surgical robots can enhance the safety and efficiency of surgeries. However, the diverse surgical skill expressions in expert demonstrations create multimodal patterns that can lead to suboptimal outcomes under unimodal policy learning methods. Additionally, there can be distribution shifts due to imbalances between positive and negative samples in surgical task data, and the learned policies may fail when encountering unseen states or external perturbations. To address this, we propose Diffusion-Based Imitation-to-Reinforcement Adaptation (DIRA), a three-stage framework that leverages a conditional diffusion-based implicit policy for multimodal action modeling, a value-consistent critic warm-up for stable value estimation, and an online strategic perturbation adaptation for smooth imitation-to-reinforcement learning transition. Experiments on the SurRoL platform show that the proposed DIRA achieves a higher success rate than existing methods, with an average improvement of 24 percentage points and up to 39 percentage points on complex tasks. Moreover, the experiments with Real World Demonstrations further demonstrate the potential of DIRA for real-world surgical task automation.

RA-L 2026-06-01

Social Robot Navigation Under Kinodynamic Constraints Using Learning-Informed Sampling for Indoor Environments

Steven Silva, Victor Romero-Cano, Juan David Hernández

导航 / SLAM / 自动驾驶人机交互 / 遥操作
摘要

With the inclusion of robots in social spaces to assist in service tasks (e.g. waitering and guiding people), having robots that move in a socially acceptable manner has become a requirement. In such dynamic environments, robots might show sudden stops and unexpected changes in acceleration, which can negatively affect the robots' predictability and legibility. Previous work has attempted to reduce such acceleration changes by generating paths that are both kinodynamically feasible and socially acceptable. However, most of these approaches not only fail to guarantee path optimality but also involve a computational cost that limits the robot's online navigation capabilities. In this paper, we present a social robot navigation (SRN) framework designed to overcome these limitations through three blocks: 1) world representation, 2) multilayered path planning, and 3) path-following control. Our approach incorporates a CNN-driven informed sampling strategy, which improves the robot's path planner performance to meet online computation constraints. We extensively benchmark our framework against state-of-the-art approaches in a variety of simulated scenarios, and demonstrate its feasibility with the Reachy robot in real-world tests.

RA-L 2026-06-01

LogicFlow: Amortizing Signal Temporal Logic Into Continuous Vector Fields for Safe Driving

Shaochen Wang, Qilin Wu, Yanjie Xia

机器人学习控制与动力学
摘要

Navigating complex traffic environments requires automated vehicles to reconcile the versatility of human-like behavior with strict adherence to traffic regulations. While diffusion-based generative models excel at capturing multimodal distributions, their fundamental reliance on iterative denoising along stochastic, high-curvature probability paths renders the integration of formal safety constraints computationally prohibitive and geometrically unstable. To bridge this gap, we propose LogicFlow, a framework that aligns generative dynamics with driving physics via logic-embedded flow matching. Unlike diffusion models that denoise abstract latent states, LogicFlow learns a continuous velocity field directly mapped to vehicle kinematics, constructing straight probability paths that facilitate the seamless embedding of signal temporal logic (STL). Central to our approach is an amortized logic rectification mechanism, which treats STL robustness gradients as intrinsic potential forces to steer the vector field during training. This effectively distills safety constraints into the model weights, enabling the generation of highly compliant trajectories without expensive inference-time optimization. Extensive evaluations on the nuScenes dataset demonstrate that our framework redefines the safety-efficiency trade-off, outperforming state-of-the-art baselines in rule satisfaction and collision avoidance while reducing inference latency.

RA-L 2026-06-01

PLATO Hand: Shaping Contact Behavior With Fingernails for Precise Manipulation

Dong Ho Kang, Aaron Kim, Mingyo Seo, Kazuto Yokoyama, Tetsuya Narita, Luis Sentis

操作与机械臂控制与动力学
摘要

We present the PLATO Hand, a dexterous robotic hand with a hybrid fingertip that combines a rigid fingernail, embedded distal phalanx, and compliant pulp to shape contact behavior during manipulation. By mechanically organizing how contact is initiated, supported, and transmitted at the fingertip, this structure creates stable and task-relevant contact conditions across diverse object geometries and grasp orientations. We develop a strain-energy-based bending–indentation model to guide the fingertip design and to explain how material stiffness and contact geometry govern deformation partitioning within the fingertip. Experiments show improved pinch stability, improved fingernail-mediated dorsal-contact force transmission and proprioceptive observability, and successful execution of edge-sensitive manipulation tasks, including paper singulation, card picking, and orange peeling. These results show that coupling a mechanically structured contact interface with a force-motion-transparent finger mechanism provides a principled approach to precise manipulation.

JFR 2026-06-16

Research on Orchard Navigation Path Planning Based on 3D LiDAR SLAM Considering Terrain Roughness

Yiting Chen, Jiali Fan, Chenglong Li, Boliao Li, Zhenbo Wei, Jun Wang

导航 / SLAM / 自动驾驶感知与传感控制与动力学
摘要

Autonomous navigation in orchards is essential for enhancing operational efficiency and ensuring safe agricultural operations. However, autonomous navigation in orchard environments presents significant challenges due to uneven surfaces and limited visual information in natural environments. To address these issues, this study proposed a shortest‐path planning method for autonomous orchard navigation based on 3D LiDAR SLAM. First, a global 3D map was constructed using the LIO‐SAM algorithm. Ground points were then separated using the Cloth Simulation Filter (CSF), and terrain roughness information was extracted from the ground point cloud to identify rugged areas that might compromise robot stability. In parallel, an improved Random Forest model was used to segment fruit‐tree points, after which DBSCAN was applied to extract individual tree centers and the Kernel Density Estimation (KDE) method was used to estimate tree‐row direction. Finally, a cost map integrating fruit‐tree distribution and terrain roughness information was constructed, and an improved A* algorithm was employed to generate efficient and terrain‐adaptive paths. The proposed method was evaluated in both a simulation and a real pear orchard. The results showed that the proposed approach reduced traversal over rugged terrain by more than 50% and lowered estimated energy consumption by nearly 48%, while maintaining comparable path lengths and high computational efficiency. Field experiments further demonstrated reliable path‐following performance, with average lateral and longitudinal deviations within 0.18 meters and heading deviation below 3.1°. These findings highlight the practical value of incorporating terrain roughness into path planning for robust and efficient orchard navigation.

RA-L 2026-06-12

Embroidery Actuator Utilizing Embroidery Patterns to Generate Diverse Fabric Deformations

Yuki Ota, Yuki Funabora

摘要

This paper presents a novel Embroidery Actuator, a fabric-integrated pneumatic actuator that enables diverse and controllable deformations through embroidery pattern design. Unlike conventional fabric actuators that rely on fiber- or thread-shaped actuators, the proposed actuator is fabricated by directly stitching an inflatable tube onto the fabric using a cord-embroidery technique. The embroidered thread and the fabric jointly form a sleeve that constrains the expansion of the inflatable tube, converting internal pressure into targeted bending or stretching deformations. By varying the embroidery pattern, such as zigzag or cross configurations, different geometric constraints can be realized, allowing for flexible control of deformation direction and magnitude. Analytical deformation models based on theNeo-Hookean modelandLagrange's equationswere developed to predict the relationship between pneumatic pressure and bending angle. And then,experiments demonstrated that the actuator achieved 47 degrees of flexion on the fabric surface side and 165 degrees on the reverse side by altering the embroidery pattern. Additionally, the created model expressed deformation with an error margin of several degrees.

RA-L 2026-06-12

Variable Stiffness Caudal Peduncle Enables Higher Propulsion Performance of a Robotic Fish

Xiaofei Wang, Xiang Li, Lixia Yan, Shiji Song

摘要

Inspired by the biological mechanism of fish caudal peduncles, which are key structures connecting musculature to the caudal fin and modulating stiffness for controlled energy transfer to enable high maneuverability across diverse swimming scenarios, this letter proposed a variable-stiffness caudal peduncle for a robotic fish that integrates thermoplastic polymer polycaprolactone, using temperature control to modulate the molten state of the material and thereby adjust stiffness. This structure enables the formation of an optimal body profile across a wide frequency range, enhancing propulsion performance. The Pseudo-Rigid-Body Model and Lagrangian method were used to model the flexible caudal peduncle and dynamic behavior of the robotic fish, respectively. Thrust results show that the caudal peduncle, in its molten state, exhibits superior performance at low frequencies, while in its solid state, it performs better at high frequencies. Simulations and experiments revealed an optimal stiffness for maximum thrust, with a peak average thrust of 0.72 N. Untethered swimming tests confirmed that temperature-based stiffness regulation of the PCL molten state enables a maximum speed of 0.47 m/s (0.88 body lengths per second) and a minimum cost of transport of 68.9 J/m/kg.

Sci. Robotics 2026-04-15

A careful examination of large behavior models for multitask dexterous manipulation

Jose Barreiros, Andrew Beaulieu, Aditya Bhat, Rick Cory, Eric Cousineau, Hongkai Dai, et al.

操作与机械臂机器人学习感知与传感
摘要

Robot manipulation has seen tremendous progress in recent years, with imitation learning policies enabling successful performance of dexterous and hard-to-model tasks. Concurrently, scaling data and model size has led to the development of capable language and vision foundation models, motivating large-scale efforts to create general-purpose robot foundation models. Although these models have garnered considerable enthusiasm and investment, meaningful evaluation of real-world performance remains a challenge, limiting the pace of development and inhibiting a nuanced understanding of current capabilities. Here, we rigorously evaluated multitask robot manipulation policies, referred to as large behavior models, by extending the diffusion policy paradigm across a corpus of simulated and real-world robot data. We proposed and validated an evaluation pipeline to rigorously analyze the capabilities of these models with statistical confidence. We compared against single-task baselines through blind, randomized trials in a controlled setting, using both simulation and real-world experiments. We found that multitask pretraining made the policies more successful and robust and enabled teaching complex new tasks more quickly, using a fraction of the data when compared with single-task baselines. Moreover, performance predictably increased as pretraining scale and diversity grows.

T-RO 2026-04-28

Allocation for Omnidirectional Aerial Robots: Incorporating Power Dynamics

Eugenio Cuniato, Mike Allenspach, Thomas Stastny, Helen Oleynikova, Roland Siegwart, Michael Pantic

无人机 / 空中机器人操作与机械臂控制与动力学
摘要

Tilt-rotor aerial robots are more dynamic and versatile than fixed-rotor platforms, since the thrust vector and body orientation are decoupled. However, the coordination of servos and propellers (the allocation problem) is not trivial, especially accounting for overactuation and actuator dynamics. We incrementally build and present three novel allocation methods for tilt-rotor aerial robots, comparing them to state-of-the-art methods on a real system performing dynamic maneuvers. We extend the state-of-the-art geometric allocation into a differential allocation, which uses the platform's redundancy and does not suffer from singularities. We expand it by incorporating actuator dynamics and propeller power dynamics. These allow us to model dynamic propeller acceleration limits, bringing two main advantages: balancing propeller speed without the need for nullspace goals and allowing the platform to selectively turn off propellers during flight, opening the door to new manipulation possibilities. We also use actuator dynamics and limits to normalize the allocation problem, making it easier to tune and allowing it to track 70% faster trajectories than a geometric allocation.

RA-L 2026-06-04

OCR-3D: Occlusion-Robust Context-Guided Relative Depth for Monocular 3D Object Detection

Jiasheng Su, Chenchao Hu, Yayan Xu, Siyu Chen, Yang Sun, Jinhe Su

感知与传感
摘要

Most existing monocular 3D object detection methods primarily rely on object-centric foreground regions to infer object depth. However, the estimated depth often becomes unreliable under occlusion, leading to inaccurate 3D detection. Notably, real-world robotic scenes are not composed solely of foreground elements. Context regions such as roads and buildings contain rich and relatively stable structure cues, including the global depth layout and distance-dependent scale gradients, most of which can provide valuable information for object depth inference under occlusion. Motivated by this, we propose OCR-3D, a context-guided relative-depth framework that assists depth estimation from foreground regions. Specifically, we first use the foreground mask predicted by a region decoupling module to decompose global features into foreground features and context features, thereby preventing the foreground from dominating subsequent context modeling. Building upon this, we introduce a relative-depth prediction branch that learns relative-depth representations of the foreground from context regions, and injects them into foreground depth prediction via a depth cross-attention mechanism. Experimental results on multiple benchmark datasets demonstrate that our method outperforms state-of-the-art methods while maintaining an efficient monocular inference pipeline.

RA-L 2026-06-04

FillFusion-GS: From Sparse Scans to Dense Scenes via Visual-Structure-Guided 3D Gaussian Splatting

Yunda Sun, Zhong Wang, Lin Zhang, Ying Shen, Shengjie Zhao

感知与传感
摘要

Achieving high-fidelity dense 3D reconstruction remains a critical challenge in autonomous robotics. While 3D Gaussian Splatting (3DGS) has shown promise, its application with LiDAR is severely hindered by the mismatch between the dense initialization required by 3DGS and the inherent sparsity and limited field of view of LiDAR. This data deficiency results in inter-scan voids and unobserved structural voids, which force Gaussian primitives to undergo disordered deformation and splitting during optimization, ultimately degrading reconstruction quality. To bridge this gap, we propose FillFusion-GS, a novel geometry-aware hybrid explicit-implicit LiDAR-Camera fusion reconstruction framework. Unlike existing methods, our method prioritizes establishing a structurally complete and dense spatial distribution of primitives before optimization. Specifically, implicit neural representations are employed to extract continuous scene surfaces from sparse LiDAR scans to resolve inter-scan voids. Building upon this, a Visual Planar-Guided Structure Completion (VP-GSC) module is introduced to leverage 2D visual cues for completing unobserved structural voids. With this complete primitive spatial distribution established, a tightly-coupled optimization scheme is further incorporated to ensure cross-modal consistency, enabling both high-fidelity rendering and accurate geometry. Additionally, a multi-sensor indoor dataset named the Jishi dataset is collected using a custom handheld device, allowing for flexible trajectories. Extensive experiments on public benchmarks and the Jishi dataset demonstrate that FillFusion-GS effectively fills geometric voids, outperforming state-of-the-art methods in terms of both geometric accuracy and visual rendering quality.

RA-L 2026-06-04

LTGNet: Linear Temporal 3D Object Detection With Geometric-Robust Encoding and Frequency-Domain Smoothing

Chenhao Ma, Zhifeng Rao, Zhiyun Lin

感知与传感
摘要

This paper introduces LTGNet, a robust and efficient framework designed for temporal 3D object detection from long-range LiDAR sequences. Existing sequential detectors face three key challenges: the motion sensitivity of Cartesian representations, the accumulation of noise and misalignment over long horizons, and the quadratic cost of attention-based temporal fusion. LTGNet addresses these issues with three efficient components. A tetrahedral geometric encoder produces rotation- and translation-robust local geometric features to enhance temporal consistency, a frequency-domain Gaussian module suppresses high-frequency noise and minor misalignments across time, and a Mamba-based linear temporal model equipped with a spatiotemporal shift mechanism enables global reasoning with$O(N)$complexity. Experiments on the Waymo Open Dataset demonstrate that LTGNet achieves competitive or superior accuracy while significantly improving efficiency, making it a strong solution for long-term LiDAR-based 3D detection.

IJRR 2026-04-30

Real-time position and orientation estimation in dual self-propelled swimmers using artificial lateral line systems with intermittent swimming strategy

Ruosi Liu, Yufan Zhai, Qianming Huang, Zhenyu Ding, Yang Ding, Guangming Xie

导航 / SLAM / 自动驾驶感知与传感医疗 / 软体 / 微纳控制与动力学
摘要

In nature, fish leverage lateral lines to detect subtle variations in flow velocity and pressure. Inspired by this, artificial lateral line systems (ALLS) have been developed, with notable success in underwater target recognition and localization. However, due to the inherent complexity of fluid dynamics, the perceptual capabilities of current systems remain significantly inferior to those of real fish, typically limited to detecting simple objects under fixed spatial configurations between the perceiver and target. In this work, the perception capabilities of ALLS are extended to a more challenging scenario, wherein a free-swimming robotic boxfish is enabled to estimate, in real time, both the position and orientation of a free-swimming robotic koi carp. To improve signal fidelity, a bio-inspired intermittent swimming pattern is introduced to reduce the impact of the perceiver’s own oscillations on the flow field and suppress sensor noise. A hybrid network architecture is proposed to extract informative features from complex vortex-induced pressure signals, wherein an attention mechanism is incorporated to facilitate enhanced spatiotemporal feature extraction across sensor channels. This architecture outperforms conventional models in both accuracy and efficiency. Extensive experiments on both Computational Fluid Dynamics and real-world platforms demonstrate that the perceiver can precisely infer the target’s position and orientation from local pressure data alone. These results affirm the robustness of the proposed method and shed light on the intermittent behaviors in real fish, offering new avenues for bio-inspired robotic perception.

JFR 2026-06-12

DynaSki: A Robust Locomotion Framework for Dynamic Skiing Robot on Challenging Terrains

Tenghui Wang, Zhijun Chen, Yunpeng Yin, Limin Yang, Liangyu Wang, Fangyong Yu, et al.

足式 / 四足机器人导航 / SLAM / 自动驾驶控制与动力学
摘要

Existing skiing robots are severely limited in speed and terrain adaptability, because their controllers neglect to explicitly model the complex ski‐snow interaction. This paper presents DynaSki, a complete locomotion framework that overcomes this limitation through a novel model predictive controller (SKIMPC) centered on a Line‐Ski Contact (LSC) model. The LSC model, formulated as a specialized Contact Wrench Cone (CWC), enables the controller to optimize the contact wrench at the ski edge. This core controller is synergistically integrated within the complete DynaSki framework, which also includes: an analytical kinematic solver for the robot's unique parallel mechanism to ensure precise wrench mapping; a dynamics‐aware trajectory planner that converts user commands into feasible trajectory references; and a dedicated landing controller to robustly manage airborne phases. The effectiveness of DynaSki is validated in real world experiments. The robot achieved a top speed exceeding 18 m/s on a steep trail (max inclination 22.8°), traversed wavy slopes with undulations, and executed a stable landing after a 0.3 s airborne phase for the first time in this field.

RA-L 2026-06-08

Locally Minimized Jerk Energy Based RRT*: A Fast Algorithm for Generating Paths With $G^{2}$ Continuity and Low Curvature Variation

Rahul, S. K. Saha, S. M. Ishtiaque, Dipayan Das

摘要

This work proposes an algorithm based on RRT* that generates$G^{2}$-continuous path from a start position to a goal position. Additionally, the generated paths satisfy initial heading constraints, bounded absolute curvature, and reduced variation of curvature. The proposed algorithm employs cubic Bézier curves for tree extension and quintic Bézier curves for rewiring, with their control points computed under the minimum jerk energy criterion. The cubic curves admit closed form solutions enabling fast tree growth, while quintic curves enable rewiring which preserves curvature continuity across the path. Results show that the proposed planner produces paths with lower curvature variation than comparable planners at competitive planning times. Additionally, the curvature and curvature rate profiles confirm that the proposed planner remains well below the allowable bound with fewer abrupt transitions. The resulting paths turn gradually around obstacles, producing natural motion suited for passenger vehicles, wheelchairs, and transport of delicate materials.

RA-L 2026-06-08

Enhanced Soft Actuator With High Motion Potential and Multi-Drive Solutions Via Bidirectional Motion and Reconfigurable Strategies

Baiyi Wang, Jinhong Du, Haozhi Sun, Zihan Li, Xinhua Liu, Ning Tan

摘要

Addressing the issues of low-level development of existing soft actuators' motion potential and insufficient reconfigurability, a bionic reconfigurable soft actuator with high motion potential and multi-drive solutions (BRSA) is proposed in this paper. Inspired by the structure characteristics and motion properties of lanterns, a drive structure is designed that can perform the bidirectional motion in its initial state. Through designed reconfigurable strategies, its bidirectional motion capability is introduced into the BRSA. Based on in-depth modelling and simulation, the key parameters affecting the drive's performance are optimized, ultimately achieving its optimal design for bidirectional motion capability. Experiment results verify the significant enhancement effect of BRSA's bidirectional motion capability on its motion potential. It demonstrates significant advantages in terms of dynamic performance, motions under nearly ten times loads, and repeatability. Simultaneously, reconfigurable strategies and bidirectional motion capability further expand the diversity of driving solutions while BRSA is performing bulb-twisting and gripping tasks. The proposal of this study provides theoretical reference for deep actuation of soft actuators, while also opening new research avenues for their structure reconfiguration and high adaptability under variable operating conditions.

JFR 2026-06-17

Design and Verification of a Multi‐Feature Wind Field Generation Device Based on Array‐Type Rotors

Wenqing Zhang, Ke Liang, Imran, Suiyuan Shen, Yu Chen, Jia Lv, et al.

无人机 / 空中机器人控制与动力学
摘要

To address the limitations of existing wind testing equipment and enable realistic airflow simulation for unmanned aerial vehicle (UAV) flights, a novel rotor‐based wind wall system was developed. This system accurately replicates complex aerodynamic environments, allowing for precise, controlled indoor testing of UAV performance and aerodynamic behavior. It provides a reliable and versatile platform for UAV research under realistic yet reproducible conditions. The rotor array design was validated using computational fluid dynamics (CFD) simulations in ANSYS Fluent, employing the finite volume method under representative operating conditions. Results showed that beyond 0.4 m from the wall, the wind field achieved high uniformity, with deviations in the velocity streamlines normal to the plane below 0.01 m. Both uniform and non‐uniform wind profiles were attainable by adjusting the distance from the wall, without compromising adjacent airflow stability, confirming the design's effectiveness in generating controllable and stable airflow. Wind speed accuracy was evaluated by comparing simulation outputs with actual data from UAV flights. The system demonstrated a maximum error of less than 0.5 m/s and a mean error of 0.147 m/s, indicating strong agreement with real‐world conditions. These results validate the device's high accuracy and reliability in simulating realistic wind fields. The rotor‐based wind wall represents a significant advancement in indoor aerodynamic testing, enabling the detailed investigation of UAV performance across diverse wind conditions. Its precise control, stability, and fidelity make it a valuable tool for advancing UAV design, control algorithms, and flight testing methodologies in a safe and repeatable environment.

RA-L 2026-06-01

Force-Level Fusion of a Diffusion Transformer Model With a Learnable Social Force Model Based Elastic Strip for Safe Trajectory Prediction

Vishnu Dev Tripathi, Vaibhav Malviya, Rahul Kala

机器人学习
摘要

Trajectory prediction has been solved by using both Physics-based methods (PbM) and learning-based meth ods (LbM). PbM cannot capture complex social interactions, while LbM are unreliable for out-of-distribution scenarios. Some popular approaches make an LbM learn the parameters of a PbM, but the overall system is constrained to operate within the rules modelled by the PbM (e.g., overtaking a vehicle with a lateral force tangential to the obstacle, unmodelled by the PbM). Similarly, some approaches operate LbM within safety bounds set by PbM, in which case a seemingly safe but interpretably wrong output of LbM can be accepted and executed. In this paper, we propose a force-level fusion of an LbM with a social force model (whose parameters are learned), where the forces of both models are added. The prediction is interpreted as an elastic strip whose internal forces ensure smoothness, while the cumulative social forces form the external forces. The proposed model has been tested on publicly available datasets, against characteristic behaviours (following, overtaking, etc.) using a behaviour-aware synthetic dataset, and out-of-distribution scenarios. The perfor mance of our proposed model is compared with various state-of the-art approaches, and it generated fewer errors.

RA-L 2026-06-01

Design, Modeling, and Validation of a 6-DoF Wearable Puncture Robot

Jianfeng Yao, Canhui Wu, Bang Liu, Zhuang Fu, Zi Fang, Fei Jing, et al.

医疗 / 软体 / 微纳
摘要

Puncture robots are increasingly pivotal for minimally invasive interventions. However, achieving a balance among compactness, dexterity and sufficient load capacity remains a challenge. This article presents the design, modeling, and validation of a novel 6-DoF wearable puncture robot intended for precise percutaneous procedures. The system features a dual layer parallel mechanism for orientation and positioning, integrated with an oblique-roller feeding unit to realize spin and axial insertion. Kinematic models were derived to support precise pose regulation and needle steering. A fully functional prototype was constructed and comprehensive benchtop experiments demonstrated that the robot generates a maximum axial feed force of 2.54 N, sufficient to penetrate viscoelastic biological tissues. Furthermore, phantom puncture tests validated a mean targeting accuracy of 1.33 mm under varying insertion angles. These results confirm that the proposed system effectively combines high-load capability with precise kinematic control, offering a promising solution for robot assisted microsurgery.

RA-L 2026-06-01

A Novel Method With Encoder-Decoder for Cross-Sensor Adaptation in Surface Shape Sensing With Sparse Strain Sensors

Shuo Wang, Heng Luo, Dian Jin, Xiaoming Tao

医疗 / 软体 / 微纳
摘要

Performance variations in sensor arrays, caused by intrinsic differences or installation conditions, can lead to inconsistent results during shape sensing. To obtain accurate results, a large amount of data is usually required, and a separate model must be retrained for each sensor array, thereby increasing the cost and time of data acquisition, transmission, and computation. To address this issue, this work proposes an encoder-decoder architecture for surface shape sensing based on sparse strain sensors and further incorporates meta-learning and few-shot adaptation strategies to enable adaptation across different groups of sensor arrays. Experimental results demonstrate that, after the cross-sensor adaptation, a newly deployed sensor array achieves a sensing error of approximately 4.0 mm relying on less than 5.0% newly labeled data and requiring an adaptation time of under 1 s, which represents a substantial improvement from 23.0 mm error without adaptation and 20-minute data collection time required to train a new model. Moreover, the number of points with errors below 5.0 mm increased by more than 65.0%. These results indicate that the proposed method can substantially reduce the cost and training burden of surface shape sensing, and it has broad potential applications in soft robotics and wearable devices.

RA-L 2026-06-01

Tension-Based Dynamic Control for Rope-Driven Robots Under Edge Collisions

Myeongjin Choi, Sahoon Ahn, Doyoung Park, Hwa Soo Kim, TaeWon Seo

控制与动力学
摘要

This paper presents a tension-based dynamic control framework with torque-level tension tracking for a rope-driven ascending robot (RDAR), which differs from the position-based control schemes used in previous RDAR studies. The robot operates in unstructured environments where rope–edge collisions introduce significant disturbances such as anisotropic friction and tension loss. To address these effects, a hierarchical control architecture is developed that integrates a sliding-mode controller with adaptive gain (SMC-AG) and a disturbance observer (DOB). The high-level controller generates tension references using real-time tension feedback to reject collision-induced disturbances, while the low-level controller directly tracks these tensions in the motor-torque domain via PID control with model-based feedforward compensation derived from ascender dynamics. Lyapunov-based analysis guarantees closed-loop stability. A compact rope–edge experiment characterizes collision-induced friction and supports the disturbance modeling assumptions. Real-time tension regulation further mitigates rope hysteresis and improves responsiveness. Experimental results across various trajectories and speeds show that the proposed controller reduces the RMS tracking error by up to$80\%$and the peak error by$20\%$compared to conventional position-based PID with feedforward compensation, demonstrating strong robustness against edge interactions.

RA-L 2026-06-01

SplatXtRact: Tractable Gaussian Splatting via Open World Region-of-Interest Extraction and Refinement

Hannah Schieber, Constantin Kleinbeck, Angela P. Schoellig, Stefan Leutenegger, Daniel Roth

感知与传感
摘要

We present a task-conditioned refinement for 3D Gaussian Splatting (GS) that enables robots or human operators to selectively extract task-relevant regions of a learned scene. Given a pre-trained GS map, our approach supports local region-of-interest (ROI) refinement, preserving a global map consistency while meeting close to real-time constraints required for interactive robotic perception. The framework decouples semantic ROI selection from initial GS optimization, allowing flexible integration with external and novel perception models. We evaluate our approach on indoor and outdoor data (TUM RGB-D, MipNeRF360), demonstrating a higher novel view syn-thesis quality compared to the state-of-the-art, reduced artifacts, and bounded latency suitable for human-in-the-loop operation.

RA-L 2026-06-01

Velocity Obstacle-Based Control Barrier Function for Safety-Critical Predictive Control

Francesco Trotti, Daniele Meli

控制与动力学
摘要

This paper introduces the Velocity Obstacle-based Control Barrier Function (VO-CBF), a novel safety framework bridging geometric collision prediction with dynamic control. While classical Velocity Obstacles ignore system dynamics, the VO-CBF encodes collision cones directly into the dynamic state space. By natively coupling position and velocity, this formulation mathematically reduces the constraint's relative degree from two to one for force-controlled systems, fundamentally eliminating the solver conservatism and input saturation conflicts associated with standard high-order predictive barriers. We propose a hybrid Nonlinear Model Predictive Control (NMPC) architecture that enforces strict discrete-time invariance for the immediate step and affine approximations across the prediction horizon, balancing formal safety guarantees with computational tractability. Comparative simulations and hardware experiments demonstrate that the VO-CBF effectively throttles velocity to prevent collisions in high-inertia scenarios where standard kinematic baselines fail.

RA-L 2026-06-01

A Soft Worm Robot for Sub-Centimeter Scale Pipeline Inspection

Boyi Xu, Ke Yi, Yitong Zhou

足式 / 四足机器人
摘要

Inspection of sub-centimeter pipelines with continuous spatial curvatures remains challenging due to extreme geo metric confinement. This paper presents an ultralight, trunkless worm-inspired robot actuated by a single twisted and coiled actuator (TCA) specifically optimized for short-range inspection within localized extreme spatial bottlenecks. By integrating actuation and structural support into a single TCA and eliminating rigid trunks and auxiliary recovery mechanisms, the robot achieves a compact and lightweight design, with a mass of 0.5–1.1 g, a length of 32 mm, and a diameter of 4–10 mm. Directional peristaltic locomotion is enabled by a wedge-based micro silicone frustum foot that provides passive frictional anisotropy without complex control. A simplified mechanics-based anchoring model captures the dominant directional friction asymmetry induced by foot geometry and guides parameter optimization. Experiments demonstrate a maximum friction anisotropy ratio of 4.75, enabling stable crawling at 0.40 mm/s horizontally and 0.13 mm/s vertically in 4–9 mm pipelines, including complex three dimensional curved configurations. Blind inspection experiments with an onboard micro-camera further validate the robot's potential for inspection in highly confined pipeline environments.

RA-L 2026-06-01

RWTCbot: A New Wall-Climbing Robot Capable of Wall-to-Wall Transitions At Internal and External Right Angles in All Scenarios

Ran Jiao, Liang Chen, Xinyang Luhan, Lili Hu, Heng Zhu, Minglu Zhang, et al.

足式 / 四足机器人
摘要

This letter introduces RWTCbot, a magnetic-adhesion wall-climbing robot capable of performing all types of internal and external right-angle wall transitions. The robot is equipped with an auxiliary adhesion module that provides adjustable magnetic support and a passive rotary joint that allows relative rolling between robotic front and rear sections. Building on these features, we developed a control algorithm that enables RWTCbot to automatically navigate all types of right-angle transitions, including challenging cases such as moving between walls and ceilings or between two walls that are perpendicular to each other during horizontal travel tasks. A static analysis of RWTCbot's wall-transition behavior was conducted, and comprehensive experimental tests across all scenarios confirmed its reliable transition performance.

RA-L 2026-06-01

Agile 3-D Motion Primitives in a Fish-like Robot

Prashanth Chivkula, Kartik Loya, Phanindra Tallapragada

足式 / 四足机器人
摘要

Fish-like robots have been widely explored for underwater applications, but many existing designs are tied to very specific morphologies such as that of a tuna, dolphin, or jellyfish, which limits their versatility and often requires complex, multi-actuator control to achieve full 3 dimensional motion. These robots often struggle to match the agility of their biological counterparts. Moreover, most of these robots rely on a single method of generating undulatory locomotion, constraining both their maneuvering capabilities and their operational efficiency to a single performance regime. In this work, we present a novel fish-like robot with an unbalanced rotor, an actuated flexible tail and two independently controlled fins to achieve 3 dimensional motion. The design enables multiple modes of actuation for generating motion primitives, thereby expanding operational efficiency and maneuvering range while maintaining actuator redundancy. We integrate these undulatory motion primitives with attitude motion primitives to demonstrate complex maneuvers.

RA-L 2026-05-20

Machine Learning-Enabled Soft Finger Grasping With a Kinesthetic Sense of Touch

Stephanie O. Herrera, Tae Myung Huh, Dejan Milutinović

操作与机械臂感知与传感控制与动力学
摘要

This paper presents a contact-sensitive fingertip pinch grasping method for a pneumatic soft gripper while removing the need for tactile sensors at contact points, which are points of wear and tear for the sensors. We propose a reference tracking feedback control for bending-sensitive measurements and use machine learning to detect contacts in the process of grasping. Machine learning is trained on signals from the feedback control loop, including the reference, the reference tracking error and controller output signals. We evaluate this method with 140 grasping experiments on a physical system using 7 objects of various sizes and positions, and compare it with a baseline strategy for soft gripper grasping. The performance of our grasping approach leads to a 97.9% grasping success rate with the robotic arm being able to lift objects vertically without losing contact with them.

Sci. Robotics 2026-04-22

From autonomy to alliance: Robotic foundation models must learn with us, not just for us

Sharmita Dey, Robert Riener, Strahinja Dosen, Stefano V. Albrecht

机器人学习
摘要

This Viewpoint urges reimagining of robotic foundation models, from treating the robot as a solitary, omnipotent agent to embracing a multiagent, alliance-aware paradigm. Alliance-aware models learn with humans and other robots, not merely for them, by embedding mechanisms that foster social interaction and generalization across heterogeneous partners. We outline six design pillars that cultivate such collaborative intelligence: interaction priors, partner modeling (machine theory of mind), modular and composable policies, norm adaptation, trust-aware memory, and communication. Together, these pillars empower robots to fluidly switch social roles, adapt to unfamiliar collaborators, and coordinate robustly within dynamic multiagent ecologies spanning homes, factories, clinics, and field operations.

Sci. Robotics 2026-04-22

Robot farm elegy

Robin R. Murphy

人形机器人
摘要

The 2025 novel Mechanize My Hands for War features humanoid robots for agriculture.

JFR 2026-06-16

Deep Learning Based Dirt Detection and Cleanliness Evaluation in Autonomous Indian Domestic Concrete Water Tank Cleaning Robot

Rajesh Kannan Megalingam, Kusumanchi Surya Shanmukh, Aditya Ashvin, Pochareddy Nishith Reddy, Aryan Kurungadathil, Shree Rajesh Raagul Vadivel

机器人学习感知与传感
摘要

Water tanks are vital in supplying and storing water for both domestic and industrial needs. Improper maintenance of water tanks leads to water contamination. Contaminated water is one of the main contributors to skin and hair diseases. It is crucial to ensure that water tanks are kept clean and free of contaminants, especially in residential and commercial settings. Currently, there are hardly any method for dirt or stain detection and cleanliness evaluation after cleaning the tanks. In addition, manual methods are still widely used to clean these tanks, and the process is tiresome and time‐consuming. Autonomous water tank cleaning robots have been suggested to address these challenges, performing the cleaning process with minimal human intervention, thereby reducing time, effort, and risks. Although research has been conducted in this field, there has been limited progress in integrating artificial intelligence (AI)‐based cleanliness evaluation modes into robotic systems. This research proposes an autonomous robot capable of cleaning both the floor and walls of concrete water tank structures. The robot incorporates a custom‐trained modified U‐Net model to detect and clean residual dirt patches missed during the initial cleaning. The models were trained and validated on a custom dataset of 7300 images. Among all the trained segmentation models, the modified U‐Net model achieved a validation accuracy of 96%.

JFR 2026-06-09

RGPB‐Planner: A Real‐Time Gaussian Potential B‐Spline Trajectory Planner for Unmanned Aerial Vehicles in Complex Environments

Fan Yang, Qiang Lu, Xiongding Liu, Na Huang, Botao Zhang, Youngjin Choi

无人机 / 空中机器人导航 / SLAM / 自动驾驶控制与动力学
摘要

In this paper, a real‐time Gaussian potential B‐Spline planner (RGPB‐Planner) is proposed to address the problem of safe flight planning for unmanned aerial vehicles (UAVs) in complex environments. Existing trajectory optimization approaches, such as B‐Spline and safe‐corridor methods, can generate smooth trajectories. However, these approaches often suffer from high computational cost, local collisions near obstacles, or instability in dense environments. To fill this gap, the proposed RGPB‐Planner integrates a Gaussian potential with B‐Spline optimization, which ensures real‐time performance, smoothness, and safety by constraining control points within the safe potential region. The proposed RGPB‐Planner comprises two modules: the front‐end and back‐end modules. In the front‐end module, a local dynamic A‐star graph search is used to find the shortest path for UAVs to safely reach target points in complex environments. In the back‐end module, the front‐end shortest path is optimized using a Gaussian potential B‐Spline trajectory optimization method that considers the UAV's kinematics and dynamics constraints. Finally, simulation results demonstrate that the proposed RGPB‐Planner reduces flight time by 15%–19%, increases average velocity by 13%–21%, and shortens trajectory length by 2%–3% compared with Fast‐Planner and Ego‐Planner, while real‐world experiments validate its real‐time capability and flight safety. Therefore, the proposed RGPB‐Planner provides a reliable solution to ensure safe flight planning in complex environments.

RA-L 2026-05-18

“I'm Not Mad, Just Focused”: Understanding Human Emotions in Human-Robot Collaboration

Seung Chan Hong, Dana Kulić, Leimin Tian

机器人学习感知与传感人机交互 / 遥操作
摘要

Human-robot collaboration (HRC) can benefit from robots' abilities to interpret human emotional states. However, current emotion recognition (ER) models in HRC often fall short, particularly due to their reliance on acted datasets and single-modality inputs like facial expressions. We propose a novel vision language model (VLM)-based ER system that leverages contextual understanding to improve emotion interpretation in HRC. We first evaluate the VLM-ER system by assessing its semantic and sentiment similarity with human annotations on an existing HRC dataset. Then, in a user study with a service robot in a collaborative delivery task, we evaluate the effects of modulating the robot's behaviour based on the user's emotional state inferred by the VLM-ER system. The results show that the proposed VLM-ER system achieves higher semantic similarity and positive sentiment alignment with human annotations compared to a baseline convolutional neural network-based system. Further, participants in the user study preferred emotion-adaptive robot behaviour facilitated by the VLM-ER system.

RA-L 2026-05-18

Learning One-Step Inverse for Performance Improvement of Nonlinear Control Systems: Application to Quadrotor Control

Hamin Chang, Jonathan Lane, Nak-seung Patrick Hyun

无人机 / 空中机器人机器人学习控制与动力学
摘要

This letter proposes a learning-based control framework to enlarge the region of attraction of controllers for nonlinear systems subject to model uncertainties. When a nominal controller designed for an assumed nominal model is applied to a physical system, discrepancies between the model and the actual dynamics act as perturbations and degrade stability and performance. To mitigate this issue, we construct a controller that learns a one-step inverse of the dynamics via kernel methods. Unlike conventional methods that augment the nominal controller, the proposed approach does not rely on specific structural assumptions on the system dynamics and uncertainties. We provide a theoretical analysis guaranteeing that, under sufficient data-density conditions, the proposed controller yields smaller perturbations than those of the nominal controller in the closed-loop system. As a proof of concept, the effectiveness of the proposed method is experimentally validated on a micro-quadrotor. The results demonstrate an enlarged region of attraction compared to the nominal controller, even when using limited data (500 training input-output pairs) from a single episode.

JFR 2026-06-08

Mobile Manipulator Robot for Autonomous In‐Situ Soil Measurements in Chile Pepper Cultivation

Roman Langenscheidt, Mahdi Haghshenas‐Jaryani, Heinz Bernhardt

操作与机械臂感知与传感控制与动力学
摘要

Chile pepper farming in New Mexico faces critical constraints from water scarcity, soil salinity, and labor shortages. Precision agriculture technologies enabling data‐driven resource management offer promising solutions. This paper presents an autonomous in‐situ soil sensing system integrated with a mobile manipulator robot for automated soil data collection. The main contribution is a unified, failure‐aware autonomous soil sensing system that integrates vision‐based surface characterization, adaptive force‐controlled sensor insertion, and insertion monitoring with failure detection and recovery into a single low‐cost field‐deployable robotic platform. The system comprises a two‐stage visual alignment process using RGB‐D camera data to adapt to terrain slope and identify obstacle‐free insertion sites, a force‐based contact detection mechanism to determine sensor‐soil contact, and adaptive impedance control with Kalman filter‐based soil stiffness estimation for controlled sensor insertion. The system is implemented on a mobile platform with a six DoF manipulator and TEROS 12 soil sensor. Field evaluation across 41 sensing operations in varying soil conditions during the early chile pepper season demonstrated a 75.6% success rate, with soil measurements correctly obtained upon full sensor insertion. In 90.2% of sensing operations, the system made correct decisions, including aborts when necessary. Main limitations included the inability to detect flush surface obstacles, occasional false contact detections, and incorrect insertion completion verification. Nevertheless, the results demonstrate the feasibility of autonomous in‐situ soil sensing in chile pepper cultivation, providing a foundation for fully autonomous soil monitoring. The methods and approaches developed in this work may extend to other crops requiring in‐situ soil measurements.

JFR 2026-06-08

A Design Specifications Template for Wearable Haptic Interfaces: A Case Study for Robotic Gripper Applications

Amr M. El‐Sayed

操作与机械臂导航 / SLAM / 自动驾驶医疗 / 软体 / 微纳人机交互 / 遥操作
摘要

Wearable haptic interfaces are increasingly important for enhancing human robot collaboration, particularly in decision critical tasks that require intuitive and reliable interaction. Despite advances in wearable systems, existing designs often lack a structured framework that systematically integrates sensing, actuation, control, and user‐centered considerations, limiting consistency, scalability, and performance across robotic applications. This paper presents a design specifications template for wearable haptic interfaces, providing a structured approach to guide designers in addressing key parameters, including user functional needs, ergonomic requirements, and technical design data. A focused review of related technologies covering exoskeletons and wearable haptic devices, sensing technologies for touch, and recent robotic grippers was conducted to inform the template and identify gaps in current design practices. The template was validated using two complementary approaches. Theoretical validation involved mapping two existing wearable haptic systems to the template, revealing that coverage of user characteristics and functional requirements ranged from 25% to 37.5%, highlighting the need for more systematic consideration of human factors. Practical validation was performed by designing, fabricating, and evaluating a three‐finger wearable haptic device integrated with a robotic gripper, demonstrating improved coverage of user‐centered and technical parameters and confirming the template's practical applicability. Overall, the proposed framework provides a systematic, application‐driven methodology for developing reliable and scalable wearable haptic interfaces. By enabling designers to integrate human factors, device functionality, and technical specifications at the pre‐design stage, it supports improved human‐robot collaboration and sets a foundation for future standardized and adaptable haptic systems in teleoperation, rehabilitation, and robotic manipulation tasks.

RA-L 2026-06-04

Toward Real-time High-fidelity Immersive Robotic Telepresence via Online 3D Gaussian Splatting

Jingxin Du, Mo Xu, Yaxin Hu, Xuebin Sun, Bilge Mutlu, Kevin Ponto

摘要

Immersive robotic telepresence offers new possibilities for users to feel like they are present in remote locations. This provides benefits for many application areas, such as remote inspection, virtual tourism, and hazardous environment monitoring. However, due to limitations in their underlying technologies, architecture design, and hardware constraints, existing robotic telepresence systems often force a trade-off between visual quality, system performance, and display immersiveness. Furthermore, much of the existing research exhibits a narrow focus, often centering on improving and evaluating a single technical aspect, such as reconstruction fidelity or rendering speed, while failing to capture the holistic user experience. We present a novel online 3D Gaussian Splatting (3DGS)-based virtual reality (VR) robotic telepresence system that more effectively balances these competing objectives through a careful co-design spanning the system's algorithms, architecture, and pipelines across the reconstruction, rendering, and streaming processes. We also introduce a comprehensive evaluation framework that goes beyond reconstruction fidelity and rendering speed to evaluate key interaction-critical factors such as convergence speed and system latency. Empirical results demonstrate that our system outperforms state-of-the-art approaches in terms of both fidelity and performance, while also providing an immersive experience.

RA-L 2026-05-29

Wrench-Feasible Workspace Analysis of Mobile Cable-Driven Parallel Robot Considering Both Tipping and Rotation Stability

Byeong-Geon Kim, Su-Won Jeong, Kyoung-Su Park

控制与动力学
摘要

This paper introduces an advanced workspace algorithm for a Mobile Cable-Driven Parallel Robot (MCDPR) with eight cables and four mobile platforms. The algorithm aims to calculate a more accurate workspace and derive a stable set of cable tensions. Through the Tension Distribution Algorithm (TDA), which considers tipping and rotating conditions, it calculates the minimum set of tensions that satisfy the static equilibrium of the end effector (EE) and the mobile platforms. According to simulation results, the practically usable workspace, when considering both tipping and rotating conditions, is smaller than the workspace obtained solely through tension conditions. This indicates that the workspace satisfying the static equilibrium of the mobile platforms in addition to that of the EE is smaller, suggesting the necessity of a workspace algorithm for MCDPR. Moreover, the non-overlapping areas between the workspaces considering only the tipping condition and only the rotating condition emphasize the need to consider both conditions simultaneously. To validate the theory, a prototype MCDPR consisting of eight cables and four mobile platforms was developed. The workspace verification experiments conducted with the prototype demonstrated a high similarity to the theoretical results, with an IoU exceeding 90%. Furthermore, tipping occurred in the workspace without considering the tipping condition, and rotation occurred in the workspace without considering the rotating condition, confirming consistency with the theoretical predictions of this study.

RA-L 2026-05-29

Using Robotics to Improve Transcatheter Edge-to-Edge Repair of the Mitral Valve

Léa Pistorius, Namrata U. Nayar, Phillip Tran, Sammy Elmariah, Pierre E. Dupont

医疗 / 软体 / 微纳
摘要

Transcatheter valve repair presents significant challenges due to the mechanical limitations and steep learning curve associated with manual catheter systems. This paper investigates the use of robotics to facilitate transcatheter procedures in the context of mitral valve edge-to-edge repair. Specifically, the study builds upon the MitraClip TM (Abbott, USA), a widely used commercial system for mitral valve repair, adapting it for robotic actuation to evaluate potential performance improvements. The complex handle-based control of a clinical repair device is replaced by intuitive robotic joint-based control via a game controller. Manual versus robotic performance is analyzed by decomposing the overall device delivery task into motion-specific steps and comparing capabilities on a step-by-step basis in a phantom model of the heart and vasculature. Metrics include procedure duration and clip placement accuracy. Results demonstrate that the robotic system can reduce procedural time and motion errors while also improving accuracy of clip placement. For the most commonly treated middle leaflet segment, mean delivery time is reduced from 146.6sec to 76.2sec while mean clip positioning error is reduced from 3.1mm to 0.6mm. These findings suggest that robotic assistance can address key limitations of manual systems, offering a more reliable and user-friendly platform for complex transcatheter procedures.

RA-L 2026-05-29

U-DiffPlan: Uncertainty-Aware Trajectory Prediction and Planning via Diffusion for Strategic Autonomous Racing

Bogyeong Suh, Jonghak Bae, Jaehyun Lim, Jinsung Kim, Kunhee Ryu, Jongeun Choi

控制与动力学
摘要

Autonomous racing demands not only high-speed trajectory planning but also strategic reasoning under uncertainty. Unlike urban driving, autonomous racing involves more complex and adversarial strategic interactions. In this work, we introduce U-DiffPlan, an uncertainty-aware prediction and planning framework designed for competitive racing scenarios. U-DiffPlan consists of two modules. A predictor module, the first, jointly estimates the future trajectories and the strategies of the target vehicle using a conditional diffusion model and an auxiliary encoder. A planner module, the second, determines the ego vehicle's optimal strategy using a mixed-strategy game-theoretic formulation and plans its trajectory via chance-constrained time-optimal model predictive control, which incorporates prediction uncertainty through chance constraints. Extensive experiments were conducted across diverse racing scenarios demonstrating that U-DiffPlan outperforms baseline approaches in trajectory prediction and race competitiveness. The proposed model achieves more effective overtaking and blocking strategies in the presence of probabilistic opponent behavior, while remaining computationally efficient for real-time inference.

RA-L 2026-05-29

LOPAL: Local Performance-Aware Active Learning From Imperfect Demonstrations

Johannes Heidersberger, Shail Jadav, Dongheui Lee

人机交互 / 遥操作
摘要

Learning from Demonstration (LfD) enables intuitive robot skill acquisition by allowing robots to learn directly from human task demonstrations. However, current methods often fail to address the fact that due to suboptimal and inconsistent human behavior, the quality of the demonstration can vary within each demonstration. Therefore, we introduce LOPAL (LOcal Performance-aware Active Learning), an active learning approach that leverages this local demonstration quality information. Our approach consists of two synergistic components. First, a local performance-driven LfD method uses a Gaussian Mixture Model (GMM) to encode both the demonstrated trajectories and their associated local quality assessments. This enables the generation of trajectories that outperform the imperfect demonstrations by utilizing complementary local data of high performance. Second, active data acquisition allows to improve beyond the imperfect demonstrations by collecting additional informative samples. In areas missing good data, the user is actively requested to provide corrections through a shared autonomy (SA) mechanism, while the robot autonomously executes the learned behavior. The efficacy of LOPAL was validated in both a simulation and a real-world experiment. The results from a real-world pipe inspection task showed that the proposed approach can achieve up to $27.31\%$ improvement in task performance while also reducing the effort required to collect the demonstrations.

RA-L 2026-05-29

Obstacle Avoidance of a Snake Robot by Partial Deformation of Crawler Gait for Overhead and Lateral Constraints

Hana Kumakura, Ching Wen Chin, Motoyasu Tanaka

足式 / 四足机器人
摘要

Wheel-less snake robots can traverse rough terrain using crawler gait. In this study, the target curve for the crawler gait is partially deformed so that the robot can traverse confined spaces without contacting overhead or lateral obstacles. Four deformation methods with different slippage characteristics are proposed. The effectiveness of the proposed methods was evaluated through real-robot experiments in planar and random-step environments, focusing on slippage, the pass-through success rate, and locomotion duration. The experimental results confirmed that the proposed methods enable effective confined-space pass-through motion on rough terrain. In particular, the deformation methods designed with slippage suppression in mind not only reduced slippage but also increased the pass-through success rate and shortened the locomotion duration.

JFR 2026-04-27 · 被引 1

A Field‐Adaptive Mechanical Weeding System Coupling Oscillating Pneumatic Mechanism With Deep Learning for Intra‐Row Weed Control in Lettuce

Rui‐Feng Wang, Chang‐Tao Zhao, Yu‐Hao Tu, Zi‐Qiu Chen, Wen‐Hao Su

导航 / SLAM / 自动驾驶机器人学习感知与传感多机器人 / 集群
摘要

Intra‐row weeding is a critical yet unresolved problem in precision horticulture, where crops and weeds exhibit tight spatial proximity and strong visual similarity under fluctuating field conditions. Addressing this challenge requires not only reliable crop‐weed discrimination but also accurate crop‐center localization tightly coupled with fast and safe actuation. This study introduces a field‐adaptive intra‐row weeding system that integrates an oscillating pneumatic mechanism with a purpose‐designed deep learning framework, LettPointNet. LettPointNet leverages multi‐scale feature fusion and a geometric center‐point constraint to enhance spatial robustness, yielding 95.1% precision, 96.8% recall, 95.9% F1‐score, 98.3% mAP50, and 90.6% mAP at a modest 7.7 GFLOPs, thereby supporting real‐time embedded operation. Relative to lightweight YOLOv11n/12n baselines, LettPointNet improves F1‐score and mAP by 3.4–3.9 and 6.5–6.6 percentage points, respectively. Conveyor‐based evaluations (0.05–0.20 m/s; three weed‐density levels) demonstrated 84.6% lettuce localization and 81.1% weeding performance, with response‐surface analysis confirming significant interactions between speed and density. Polytunnel trials further validate system robustness, achieving 82.2% and 80.9% weeding rates under favorable and low‐light conditions, respectively, with minimal crop damage (1.99% and 2.57%). Collectively, the results establish that precise perception‐actuation coupling enables reliable, real‐time intra‐row weeding and offers a viable pathway toward automated and sustainable protected‐crop management.

IJRR 2026-05-01

Explicit force-based control strategies for compliant robot-assisted ultrasound imaging

Adrian Piedra, R Brooke Jeffrey, Oussama Khatib

人机交互 / 遥操作控制与动力学
摘要

In this paper, we present a robotic ultrasound system that implements explicit contact force control and force-based full probe orientation optimization to achieve stable, responsive, and high-quality ultrasound imaging. Traditional robotic ultrasound systems often rely on implicit force control methods, such as admittance control, which are limited by their underlying motion-control loops. In contrast, our approach directly regulates contact force and moment at the end-effector, enabling rapid adaptation to heterogeneous tissue properties and dynamic environments. We benchmark the proposed explicit force controller against a state-of-the-art integral adaptive admittance controller, demonstrating a significant reduction in phase lag from 121° to 5.76° at force tracking frequencies exceeding typical respiratory rates. In a second set of benchmarking experiments, compared to the admittance-based method, the explicit force controller achieves a reduction in average force tracking error of 77.7 % when the phantom is moving toward the transducer, and 72.6 % when the phantom is moving away. We integrate the explicit force and moment controller into a six-DoF haptic framework that renders physically-grounded interaction forces to the operator while the robot autonomously regulates contact force and optimizes probe alignment based on acoustic coupling. Validation across static and dynamic scans, as well as under external perturbations, shows that the system consistently maintains target force and moment profiles, aligns the probe with local surface normals, and adapts to changing contact conditions. Experimental results demonstrate that explicit force-based control improves ultrasound image quality, as quantified by confidence maps, compared to a manual haptic scan. These findings support the use of explicit force and moment control as an effective approach for robotic ultrasound imaging.

T-RO 2026-04-27

A New Approach to Motion Planning in 3-D for a Dubins Vehicle: Special Case on a Sphere

Deepak Prakash Kumar, Swaroop Darbha, Satyanarayana Gupta Manyam, David W. Casbeer

无人机 / 空中机器人导航 / SLAM / 自动驾驶
摘要

In this article, a new model for 3D motion planning, applicable to aerial vehicles, is proposed to connect an initial and final configuration subject to pitch rate and yaw rate constraints. The motion planning problem for a curvature-constrained vehicle over the surface of a sphere is identified as an intermediary problem to be solved, and it is the focus of this paper. In this article, the optimal path candidates for a vehicle with a minimum turning radius $r$ moving over a unit sphere are derived using a phase portrait approach. We show that the optimal path is $CGC$ or concatenations of $C$ segments through simple proofs, where $C = L, R$ denotes a turn of radius $r$ and $G$ denotes a great circular arc. We generalize the previous result of optimal paths being $CGC$ and $CCC$ paths for $r \in (0, \frac{1}{2}]\bigcup \lbrace \frac{1}{\sqrt{2}}\rbrace$ to $r \leq \frac{\sqrt{3}}{2}$ to account for vehicles with a larger $r$ . We show that the optimal path is $CGC, CCCC,$ for $r \leq \frac{1}{\sqrt{2}},$ and $CGC, CC_\pi C, CCCCC$ for $r \leq \frac{\sqrt{3}}{2}.$ Additionally, we analytically construct all candidate paths and provide the code in a publicly accessible repository.

T-RO 2026-04-21

TensorTouch: Calibration of Tactile Sensors for High Resolution Stress Tensor and Deformation for Dexterous Manipulation

Won Kyung Do, Matthew Strong, Aiden Swann, Boshu Lei, Monroe Kennedy

操作与机械臂导航 / SLAM / 自动驾驶感知与传感
摘要

Advanced dexterous manipulation requires identifying and controlling multiple simultaneous contacts, including interactions with compliant objects where deformations are large. Raw optical tactile images are information rich but lack calibrated physical meaning, limiting cross-sensor use and real-world deployment. We present TensorTouch, a calibration framework that combines finite element analysis and learning to infer dense deformation and stress/force fields from a single tactile image. Across real sensors, TensorTouch achieves contact localization errors under 1.29 mm and mean force errors under 0.139 N per axis. In a multi-object selective grasp task with two simultaneously contacted objects (including identical cables), the system achieves up to 90.0% success. We further demonstrate robustness under repeated loading, yielding 38.295 dB PSNR between the initial tactile image and the image after 20,000 contacts. The learned model runs in real time at 95 Hz on an RTX 5090, proving to be suitable for contact-rich, dexterous manipulation.

T-RO 2026-04-21

AnyUser: Translating Sketched User Intent Into Domestic Robots

Songyuan Yang, Huibin Tan, Kailun Yang, Wenjing Yang, Shaowu Yang

操作与机械臂感知与传感人机交互 / 遥操作
摘要

We introduce AnyUser, a unified robotic instruction system for intuitive domestic task instruction via free-form sketches on camera images, optionally with language. AnyUser interprets multimodal inputs (sketch, vision, language) as spatial-semantic primitives to generate executable robot actions requiring no prior maps or models. Novel components include multimodal fusion for understanding and a hierarchical policy for robust action generation. Efficacy is shown via extensive evaluations: (1) Quantitative benchmarks on the large-scale dataset showing high accuracy in interpreting diverse sketch-based commands across various simulated domestic scenes. (2) Real-world validation on two distinct robotic platforms, a statically mounted 7-DoF assistive arm (KUKA LBR iiwa) and a dual-arm mobile manipulator (Realman RMC-AIDAL), performing representative tasks like targeted wiping and area cleaning, confirming the system's ability to ground instructions and execute them reliably in physical environments. (3) A comprehensive user study involving diverse demographics (elderly, simulated non-verbal, low technical literacy) demonstrating significant improvements in usability and task specification efficiency, achieving high task completion rates (85.7%-96.4%) and user satisfaction. AnyUser bridges the gap between advanced robotic capabilities and the need for accessible non-expert interaction, laying the foundation for practical assistive robots adaptable to real-world human environments.

T-RO 2026-04-21

Is Diversity All You Need for Scalable Robotic Manipulation?

Modi Shi, Li Chen, Jin Chen, Yuxiang Lu, Chiming Liu, Guanghui Ren, et al.

操作与机械臂机器人学习感知与传感多机器人 / 集群
摘要

Data scaling has driven remarkable success in foundation models for Natural Language Processing (NLP) and Computer Vision (CV), yet the principles of effective data scaling in robotic manipulation remain insufficiently understood. In this work, we investigate the nuanced role of data diversity in robot learning by examining three critical dimensions-task (what to do), embodiment (which robot to use), and expert (who demonstrates)-challenging the conventional intuition of “more diverse is better”. Throughout extensive experiments on various robot platforms, we reveal that (1) task diversity proves more critical than per-task demonstration quantity, with scene diversity playing a more important role than skill diversity for robustness and generalization under distribution shifts; (2) multi-embodiment pre-training data is non-essential for cross-embodiment transfer-models trained on high-quality single-embodiment data can efficiently transfer to different platforms, showing desirable scaling property during fine-tuning and its potential of replacing large-scale multi-embodiment pre-training; and (3) expert diversity, arising from individual operational preferences and stochastic variations in human demonstrations, can be confounding to policy learning, with action rate multimodality emerging as a key contributing factor. Based on this insight, we propose a distribution debiasing method to mitigate action rate ambiguity, the yielding GO-1-Pro achieves substantial performance gains of 15%, equivalent to using 2.5× pre-training data. Collectively, these findings provide new perspectives and offer practical guidance on how to scale robotic manipulation datasets effectively. The code will be released.

T-RO 2026-04-21

Large-Scale Multirobot Task Planning Using Efficient Hierarchical Reinforcement Learning

Xuan Zhou, Xiang Shi, Lele Zhang, Chen Chen, Hongbo Li, Lin Ma, et al.

机器人学习多机器人 / 集群控制与动力学
摘要

Multi-robot task planning (MRTP) at scale in robotic mobile fulfillment systems (RMFS) remains a challenge due to the curse of dimensionality and complex dynamic properties. Aiming to solve these challenges, we construct an end-to-end scalable multi-robot task planner capable of scaling to large-scale systems by learning hierarchical planning policies. In this planner, we design a centralized hierarchical temporal task planning framework to mitigate the curse of dimensionality while ensuring timely dynamic response. Following this framework, we propose a novel cycle-constrained asynchronous temporal graph (CycATG) to provide foundation for modeling the system dynamics. Based on the graph representation, we formulate the MRTP problem as a semi-Markov decision process (SMDP) that focuses solely on critical interaction points to improve computational and sampling efficiency. The policies in SMDP are parameterized via a hierarchical temporal attention network with temporal embedding layers to enhance spatio-temporal feature extraction. Additionally, the decoder masks in this network naturally ensure that the generated actions strictly satisfy the required dynamic hard constraints. The above hierarchical policies are jointly optimized using an efficient hierarchical REINFORCE with rollout counterfactual baseline method. To further enhance generalization performance on unlearned instances while preventing catastrophic forgetting, we extend it with region expansion curricula. Experiments demonstrate that our planner outperforms state-of-the-art methods on different MRTP instances across simulated and real-world RMFS. It successfully scales to instances with up to 200 robots, 1000 retrieval racks on unlearned maps while maintaining performance advantages.

T-RO 2026-04-21

Deep Learning-Based Process Control of Microrobot Swarms Guided by Phase Diagrams

Yuezhen Liu, Yifan Wu, Yu Liu, Hui Chen, Yibin Wang, Xingzhou Du, et al.

足式 / 四足机器人导航 / SLAM / 自动驾驶机器人学习医疗 / 软体 / 微纳多机器人 / 集群
摘要

Microrobot swarms with locomotion dexterity and shape reconfigurability show immense potential in biomedical applications. Automatic control strategies are critical for the navigation of swarms in unstructured environments. Existing control methods mainly focus on initial and final states of swarms, and swarm process control designed to avoid undesired states throughout the control process is yet investigated. In this work, we develop a deep learning-based process control strategy for swarms guided by phase diagrams. Two deep neural networks are respectively built to model the swarm shape and kinematics. Control approaches based on precise swarm models are designed to automatically tune multiple swarm parameters. A phase diagram-based controller is proposed to guide the swarm reconfiguration while eliminating the coupling effects between swarm parameters. The swarm is enabled to accurately track predefined trajectories while performing continuous reconfiguration with desired states during the entire process. By integrating the process control of swarm pattern and locomotion, the swarm can dynamically adapt to constrained unstructured spaces and achieve robust collision avoidance.

T-RO 2026-04-21

Real-Time Dual-Arm Cooperative Manipulation Under Multiple Constraints: A Two-Stage Sampling MPC Approach

Tianqi Zhu, Jianliang Mao, Jun Yang, Shihua Li

操作与机械臂多机器人 / 集群控制与动力学
摘要

This paper introduces a novel framework for reactive control in dual-arm cooperative robotic systems, addressing the significant challenges posed by high-dimensional, non-convex optimization demands, intricate kinematic, multi-modal distribution, the need for precise, and synchronized coordination. The core of our approach is a two-stage sampling-based model predictive control, which integrates k-means, dual quaternion, and null space into a cohesive system. This integration enhances the system's ability to manage complex coordination tasks, such as obstacle avoidance and holding a water cup, while mitigating risks associated with local optima and reducing control jitter. Our framework not only improves performance and reliability, but also overcomes the traditional computational bottlenecks inherent in dual-arm coordination. These advancements are validated through extensive simulations and experiments, demonstrating the robustness and efficiency of our proposed methodology.

IJRR 2026-04-24

Multiscale deformable objects manipulation via wavelet-decomposed boundary element method

Junlei Hu, Dominic Jones, Majed Melibary, Jiannan Liu, Pietro Valdastri

操作与机械臂导航 / SLAM / 自动驾驶医疗 / 软体 / 微纳控制与动力学
摘要

Robotic deformable object manipulation (DOM) faces critical challenges in industrial and medical applications due to under-actuation, unpredictable deformation, and partial observability. Model-free methods often suffer from unstable Jacobians arising from ill-conditioned observations, while physics-based models typically depend on precise parameters and volumetric meshing, limiting their real-time practicality. We propose a wavelet-boundary element method (BEM) framework that leverages multiscale wavelet descriptors to control 3D deformations directly from efficient feedback modalities, such as contours and curves. By coupling wavelets with BEM, we derive an analytical deformation Jacobian that functions independently of material stiffness (e.g., Young’s modulus), relying solely on an online-calibrated Poisson’s ratio. This mesh-free formulation significantly enhances real-time performance and robustness against sensor occlusion. Validated in simulation and on the da Vinci Research Kit (dVRK) with phantom and ex vivo animal tissue, our method achieves millimetre-level accuracy. Comparative studies against Fourier-based, model-free, and online finite element method (FEM) approaches demonstrate superior stability and computational efficiency. Notably, our framework achieves convergence speeds significantly faster than online FEM by avoiding volumetric computations, while resolving ill-conditioning through spatial–frequency localization. This work advances deformable object manipulation in unstructured environments, particularly in surgical robotics, where stability under partial observability is essential. Project page: https://junleihu.github.io/projects/dwtbem/ .

IJRR 2026-04-24

Path planning and reinforcement learning-driven control of on-orbit free-flying multi-arm robots

Álvaro Belmonte-Baeza, José Luis Ramón, Leonard Felicetti, Miguel Cazorla, Jorge Pomares

导航 / SLAM / 自动驾驶机器人学习控制与动力学
摘要

This paper presents a hybrid approach that integrates trajectory optimization (TO) and reinforcement learning (RL) for motion planning and control of free-flying multi-arm robots in on-orbit servicing scenarios. The proposed system integrates TO for generating feasible, efficient paths while accounting for dynamic and kinematic constraints, and RL for adaptive trajectory tracking under uncertainties. The multi-arm robot design, equipped with thrusters for precise body control, enables redundancy and stability in complex space operations. TO optimizes arm motions and thruster forces, reducing reliance on the arms for stabilization and enhancing maneuverability. RL further refines this by leveraging model-free control to adapt to dynamic interactions and disturbances. The experimental results validated through comprehensive simulations demonstrate the effectiveness and robustness of the proposed hybrid approach. Two case studies are explored: surface motion with initial contact and a free-floating scenario requiring surface approximation. In both cases, the hybrid method outperforms traditional strategies. In particular, the thrusters notably enhance motion smoothness, safety, and operational efficiency. The RL policy effectively tracks TO-generated trajectories, handling high-dimensional action spaces and dynamic mismatches. This integration of TO and RL combines the strengths of precise, task-specific planning with robust adaptability, ensuring high performance in the uncertain and dynamic conditions characteristic of space environments. By addressing challenges such as motion coupling, environmental disturbances, and dynamic control requirements, this framework establishes a strong foundation for advancing the autonomy and effectiveness of space robotic systems.

RA-L 2026-05-15

CLARE: Continual Learning for Vision-Language-Action Models via Autonomous Adapter Routing and Expansion

Ralf Römer, Yi Zhang, Yuming Li, Angela P. Schoellig

操作与机械臂机器人学习感知与传感
摘要

To teach robots complex manipulation tasks, it is now common practice to fine-tune a pre-trained vision-language-action model (VLA) on task-specific data. However, since this recipe updates existing representations, it is unsuitable for long-term operation in the real world, where robots must continually adapt to new tasks and environments while retaining the knowledge they have already acquired. Existing continual learning methods for robotics commonly require storing previous data (exemplars), struggle with long task sequences, or rely on task identifiers for deployment. To address these limitations, we propose CLARE, a general, parameter-efficient framework for exemplar-free continual learning with VLAs. CLARE introduces lightweight modular adapters into selected VLA modules and autonomously expands the model only where necessary when learning a new task, guided by layer-wise feature similarity. During deployment, an autoencoder-based routing mechanism dynamically activates the most relevant adapters without requiring task labels. Through extensive experiments on the LIBERO benchmark and five real-world tasks, we show that CLARE achieves high performance on new tasks without catastrophic forgetting of earlier tasks, significantly outperforming even exemplar-based methods.

RA-L 2026-05-15

MAPLE: Multimodal Mamba Agent for Event Based Policy With Adaptive Value Estimation

Hemant Kumawat, Saibal Mukhopadhyay

机器人学习感知与传感控制与动力学
摘要

Event cameras provide microsecond temporal precision and high dynamic range, offering a powerful complement to conventional RGB sensing for real-world robotic control. Yet, their asynchronous output and sparse spatial structure make it challenging for reinforcement learning (RL) frameworks to exploit them alongside frame-based observations. We identify two key challenges: (1) under modality heterogeneity, both modality-correlated and modality-specific features are critical, yet they contribute differently to modeling temporal dynamics; and (2) multimodal inputs often arrive at different temporal scales, making naive up/downsampling ineffective, while attention-based networks alone struggle to reliably capture asynchronous structure. To address these challenges, we propose Maple, a unified dynamics modeling framework that bridges event-driven and frame-based perception for visual RL. At its core, Maple introduces a Cross-Modality Prompt Generator that aligns asynchronous event streams (temporally integrated over short control-synchronized windows) with synchronous RGB frames, producing modality-aware prompts that guide a Mamba state-space backbone. This design enables linear-time temporal reasoning over long horizons while disentangling shared and modality-specific representations. An adaptive dynamics head further integrates consistent and unique subspaces for robust policy learning under fast motion and extreme illumination. Experiments on CARLA event-based driving benchmarks demonstrate that Maple significantly outperforms prior RGB-only, event-only, and multimodal baselines, achieving higher sample efficiency, stronger policy stability, and superior robustness across challenging weather, lighting, and motion conditions.

RA-L 2026-05-15

Simulation of Adaptive Running With Flexible Sports Prosthesis Using Reinforcement Learning of Hybrid-Link System

Yuta Shimane, Ko Yamamoto

机器人学习医疗 / 软体 / 微纳控制与动力学
摘要

This study proposes a reinforcement learning-based framework for adaptive running motion simulation in a unilateral transtibial amputee using a hybrid-link system that incorporates the flexibility of a leaf-spring-type sports prosthesis. The design and selection of sports prostheses typically rely on trial and error. A comprehensive whole-body dynamics analysis that accounts for interactions between human motion and prosthetic deformation can provide valuable insights for user-specific design and selection. The proposed hybrid-link system enables such analysis by integrating a Piece-wise Constant Strain (PCS) model to represent prosthetic flexibility. Based on this system, the simulation methodology generates whole-body dynamic motions of a unilateral transtibial amputee using a reinforcement learning approach. This framework integrates imitation learning based on motion capture data with accurate computation of prosthetic dynamics. Running motions are simulated under multiple virtual prosthetic stiffness conditions, and the corresponding metabolic cost of transport (COT) obtained from these simulations is analyzed. The results suggest that variations in prosthetic stiffness influence running dynamics and performance, and that COT is consistent with values reported in prior study. Our findings demonstrate the potential of the proposed approach for simulation and analysis under virtual conditions that differ from real-world conditions.

RA-L 2026-05-15

SI-Diff: A Framework for Learning Search and High-Precision Insertion With a Force-Domain Diffusion Policy

Yibo Liu, Stanko Oparnica, Simon Shewchun-Jakaitis, Guoyi Fu, Jie Wang, Jun Yang, et al.

操作与机械臂导航 / SLAM / 自动驾驶机器人学习感知与传感
摘要

Contact-rich assembly is fundamental in robotics but poses significant challenges due to uncertainties in relative poses, such as misalignments and small clearances in peg-in-hole tasks. Existing approaches typically address search and high-precision insertion separately, because these tasks involve distinct action patterns. However, supporting both tasks within a single model, without switching models or weights, is desirable for intelligent assembly systems. In this work, we propose SI-Diff, a framework that learns both search and high-precision insertion through a force-domain diffusion policy. To this end, we introduce a new mode-conditioning mechanism that enables the policy to capture distinct action behaviors under a single framework. Moreover, we develop a new search teacher policy that can generate diverse trajectories. By training on successful and efficient demonstrations provided by the teacher policy, the model learns the mapping from tactile and end-effector velocity observations to effective action behaviors. We conduct thorough experiments to show that SI-Diff extends the tolerance to x-y misalignments from 2 mm to 5 mm compared to the state-of-the-art baseline, TacDiffusion [1], while also demonstrating strong zero-shot transferability to unseen shapes.

RA-L 2026-06-01

Safety Guaranteed Control Synthesis With Multiple Backup Policies and a Varying Reachable Time Horizon

Sunwoo Hwang, Byeongjun Kim, Inkyu Jang, H. Jin Kim

摘要

For safety-critical control tasks, control barrier functions (CBFs) are powerful tools for designing safety filters that ensure safety of the systems. However, obtaining a valid CBF that defines a sufficiently large safe set usually requires extensive handcrafting due to input constraints, which should be considered for most systems, especially robotic applications. To address this challenge, this letter proposes a general framework for constructing a safety filter that employs multiple backup policies. We generalize the existing backup CBF (BCBF) approach, where the proposed quadratic program (QP) renders the corresponding safe set forward invariant. The proposed method enforces CBF-like derivative constraints only at critical instants along the backup trajectory. Also, we introduce a varying reachable time to the backup set, instead of a fixed time horizon. These modifications yield a less conservative and computationally efficient safety filter while maintaining formal safety guarantees through nonsmooth analysis. Furthermore, the proposed framework is easily extended to the use of multiple backup strategies, which can effectively enlarge the resulting safe set. Real-world collision avoidance experiments validate the scalability of the overall framework and confirm its reduced conservativeness.

RA-L 2026-05-14

Morphology-Aware Graph Reinforcement Learning for Tensegrity Robot Locomotion

Chi Zhang, Mingrui Li, Wenzhe Tong, Xiaonan Huang

足式 / 四足机器人机器人学习控制与动力学
摘要

Tensegrity robots combine rigid rods and elastic cables, offering high resilience and deployability but at the same time posing major challenges for locomotion control due to their underactuated and highly coupled dynamics. This paper intro duces a morphology-aware reinforcement learning framework that integrates a graph neural network (GNN) into the Soft Actor Critic (SAC) algorithm. By representing the robot's physical topology as a graph, the proposed GNN-based policy captures coupling among components, enabling faster and more stable learning than conventional multilayer perceptron (MLP) policies. The method is validated on a physical 3-bar tensegrity robot across three locomotion primitives, including straight-line track ing and bidirectional turning. It shows superior sample efficiency, robustness to noise and stiffness variations, and improved trajec tory accuracy. Additionally, the learned policies transfer directly from simulation to hardware without fine-tuning, achieving stable real-world locomotion. These results demonstrate the advantages of incorporating structural priors into reinforcement learning for tensegrity robot control. https://tensegrity-graph-rl.github.io/.

RA-L 2026-05-14

Data-Efficient Modeling of Hysteresis and Crosstalk for Inverse Kinematics of Soft Manipulators

Korn Borvorntanajanya, Fung Flora Leung, Jialei Shi, Enrico Franco, Philip Wai Yan Chiu, Yeung Yam, et al.

操作与机械臂机器人学习医疗 / 软体 / 微纳
摘要

Nonlinearities in soft continuum manipulators, aris ing from material hysteresis and intersegmental coupling in multi-segment robots, present significant challenges for accurate open-loop inverse kinematics (IK) control. In particular, morphable pneumatic chambers adjust their shape and stiffness with internal pressure, increasing force output but also introducing nonlinearities that complicate control. This paper introduces a sequence-based machine learning approach that is data-efficient, modeling and compensating for both hysteresis and crosstalk in systems with morphable chambers. Through systematic comparison of Long Short-Term Memory (LSTM) and Transformer architectures under data-limited conditions, we demonstrate the effectiveness of sequence-based models in capturing temporal de pendencies. We then propose a Recursive Segment-wise Crosstalk Compensation (RSCC) pipeline that decomposes control of multi segment robots into independent single-segment subproblems, with each constituent model trained using 500 samples. Ap plied to a two-segment morphable-chamber manipulator, RSCC achieves approximately 11% normalized positional error and outperforms a monolithic multi-segment LSTM baseline trained on 2000 samples within the same workspace, highlighting its potential for precise open-loop control in minimally invasive surgical applications.

RA-L 2026-05-14

E2HiL: Entropy-Guided Sample Selection for Efficient Real-World Human-in-the-Loop Reinforcement Learning

Haoyuan Deng, Yudong Lin, Yuanjiang Xue, Haoyang Du, Qianzhun Wang, Boyang Zhou, et al.

操作与机械臂导航 / SLAM / 自动驾驶机器人学习
摘要

Human-in-the-loop guidance has emerged as an effective approach for accelerating online reinforcement learning (RL) in real-world manipulation. However, existing humanin- the-loop RL (HiL-RL) frameworks often suffer from low sample efficiency, requiring substantial human interventions to achieve convergence and thereby leading to high labor costs. To address this, we propose a sample-efficient real-world humanin- the-loop RL framework named E2HiL, which requires fewer human interventions by actively selecting informative samples. Specifically, stable reduction of policy entropy enables improved trade-off between exploration and exploitation with higher sample efficiency. We first build influence functions of different samples on the policy entropy, which is efficiently estimated by the covariance of action probabilities and soft advantages of policies. Then we select samples with moderate values of influence functions, where shortcut samples that induce sharp entropy drops and noisy samples with negligible effect are pruned. Extensive experiments across 10 real-world manipulation tasks, spanning multiple embodiments and learning frameworks, demonstrate that E2HiL improves success rates by 24.9% while reducing human interventions by 9.3 https://e2hil.github.io/ .

RA-L 2026-05-14

MonoSpheres: Large-Scale Monocular SLAM-Based UAV Exploration Through Perception-Coupled Mapping and Planning

Tomáš Musil, Matěj Petrlík, Martin Saska

无人机 / 空中机器人导航 / SLAM / 自动驾驶感知与传感
摘要

Autonomous exploration of unknown environments is a key capability for mobile robots, but it is largely unsolved for robots equipped with only a single monocular camera and no dense range sensors. In this paper, we present MonoSpheres — a novel approach to monocular vision-based exploration that can safely cover large-scale unstructured indoor and outdoor 3D environments by explicitly accounting for the properties of a sparse monocular SLAM frontend in both mapping and planning. The mapping module solves the problems of sparse depth data, free-space gaps, and large depth uncertainty by oversampling free space in texture-sparse areas and keeping track of obstacle position uncertainty. The planning module handles the added free-space uncertainty through rapid replanning and perception-aware heading control. We further show that frontier-based exploration is possible with sparse monocular depth data when parallax requirements and the possibility of textureless surfaces are taken into account. We evaluate our approach extensively in diverse real-world and simulated environments, including ablation studies. To the best of the authors' knowledge, MonoSpheres is the first method to achieve 3D monocular exploration in real-world unstructured outdoor environments. We open-source our implementation to support future research.

JFR 2026-06-10

LIO‐RRTNav for Cattle Yard Inspection Robots: Prior Map Aided Relocalization and Goal‐Oriented, Smooth RRT Path Planning

Shuo Yang, Zhanhua Song, Shakeel Ahmed Soomro, Kai Wang, Yinfa Yan, Weizheng Shen, et al.

导航 / SLAM / 自动驾驶感知与传感
摘要

Autonomous navigation for inspection robots in cattle barns critically depends on localization and path‐planning algorithms. To address the issues of low navigation accuracy, long planning time, and insufficient trajectory smoothness in barn environments, this study proposes a navigation framework that combines LiDAR–inertial odometry and a rapidly‐exploring random tree (LIO‐RRTNav). For relocalization, this study develops a Fast_LIO2 with relocalization and pose optimization method (RP‐Fast_LIO2). It exploits the geometric structure of the barn to associate the current frame point cloud with historical point clouds to suppress drift. Meanwhile, stable features are extracted from dynamic scenes and registered to the global map using the Iterative Closest Point (ICP) algorithm; the registration result is used as the initial state for an iterated extended Kalman filter (IEKF) to refine pose estimation. For path planning, this study proposes a highly efficient, robust, and smooth rapidly exploring random tree method (HRS‐RRT). By incorporating a goal‐oriented random sampling strategy and a safety‐distance constraint, it enables safe and efficient planning, and the resulting path is further optimized through path pruning and cubic B‐spline smoothing based on a spring potential energy model. Simulation test results showed that the HRS‐RRT algorithm reduced 22.17%, 75.00%, and 83.09% in terms of path length, planning time and number of iterations, respectively, when compared with the traditional RRT algorithm. The experimental results revealed that RP‐Fast_LIO2 algorithm reduced the mean and root mean square error of the absolute position error by 83.21% and 79.89%, when comparing with the traditional Fast_LIO2 algorithm. In navigation experiments conducted in two cattle yards, when the robot operated at speeds of 0.3, 0.5, and 1.0 m/s, the maximum lateral and longitudinal deviations did not exceed 0.13 and 0.08 m, respectively, and the maximum heading error did not exceed 7.24°. The results acquired verified the adaptability of the LIO‐RRTNav algorithm in the cattle yard environment, meeting the requirements of cattle yard inspection robot operation.

RA-L 2026-05-12

PEMTRS: Perception-Enhanced Memory With Temporal Region Selection for Vision-Based Multirotor Navigation

Chenfeng Guo, Chen Su, Zhaopeng Zhang, Jianda Han, Yongchun Fang, Xiao Liang

无人机 / 空中机器人导航 / SLAM / 自动驾驶机器人学习感知与传感
摘要

End-to-end deep learning methods for autonomous aerial vehicles like multirotors have been successfully applied to high-agility flight. However, existing end-to-end navigation methods for multirotors typically rely solely on the robot's current observation and state, neglecting explicit modeling of temporal sequence information. Yet trajectory prediction is inherently temporal: past observations provide cues about future feasible regions, and this temporal feasibility is still not explicitly exploited in current end-to-end navigation frameworks. To overcome these limitations, we propose the PEMTRS framework, which enables the multirotor to fly with thinking by building a perception-enhanced memory structure and employing a temporal region selector to highlight regions of interest for traversability. By jointly modeling spatial perception and temporal information, PEMTRS generalizes better within the evaluated environment class than state-of-the-art end-to-end methods. To achieve more efficient training, we adopt an implicit learning strategy that replaces expert-labeled demonstration trajectories with cost-based learning signals to guide network optimization. In addition, we design an adaptive time-allocation network to dynamically adjust the time intervals between waypoints and a residual VAE to reconstruct obstacle depth for perception enhancement. Finally, simulation and real-world experiments validate the effectiveness and efficiency of the proposed method.

RA-L 2026-05-12

FruitTouch: A Perceptive Gripper for Gentle and Scalable Fruit Harvesting

Ruohan Zhang, Mohammad Amin Mirzaee, Wenzhen Yuan

操作与机械臂感知与传感控制与动力学
摘要

The automation of fruit harvesting has gained increasing significance in response to rising labor shortages. A sensorized gripper is a key component of this process, which must be compact enough for confined spaces, able to stably grasp diverse fruits, and provide reliable feedback on fruit conditions for efficient harvesting. To address this need, we propose FruitTouch, a compact gripper that integrates highresolution, vision-based tactile sensing through an optimized optical design. This configuration accommodates a wide range of fruit sizes while maintaining low cost and mechanical simplicity. Tactile images captured by an embedded camera provide rich information for real-time force estimation, slip detection, and softness prediction. We validate the gripper in real-world fruit harvesting experiments, demonstrating robust grasp stability and effective damage prevention. The hardware design files and simulation environment are open-sourced at project website: (https://fruittouch-dev.github.io/fruittouch/)

RA-L 2026-04-06 · 被引 1

An Open-Source, Reproducible Tensegrity Robot That Can Navigate Among Obstacles

William R. Johnson, Patrick Meng, Nelson Chen, Luca Cimatti, Augustin Vercoutere, Mridul Aanjaneya, et al.

导航 / SLAM / 自动驾驶控制与动力学
摘要

Tensegrity robots, composed of rigid struts and elastic tendons, provide impact resistance, low mass, and adaptability to unstructured terrain. Their compliance and complex, coupled dynamics, however, present modeling and control challenges, hindering planning and obstacle avoidance. This paper presents a complete, open-source, and reproducible system that enables navigation for a 3-bar tensegrity robot. The system comprises: (i) an inexpensive, open-source hardware design, and (ii) an integrated, open-source software stack for physics-based modeling, system identification, state estimation, path planning, and control. All hardware and software are publicly available at https://tensegrity.yale.edu/ tensegrity.yale.edu. The proposed system tracks the robot using a static overhead camera and executes collision-free paths to a goal among known obstacle locations. System robustness is demonstrated through experiments involving unmodeled environmental challenges, including a vertical drop, an incline, and granular media, culminating in an outdoor field demonstration. To validate reproducibility, experiments were conducted using robot instances at two different laboratories. This work provides the robotics community with a complete navigation system for a compliant, impact-resistant, and shape-morphing robot. This system is intended to serve as a springboard for advancing the navigation capabilities of other unconventional robotic platforms.

RA-L 2026-05-11

Dynamic Policy Learning for Legged Robot With Simplified Model Pretraining and Model-Homotopy-Inspired Transfer

Dongyun Kang, Min-Gyu Kim, Tae-Gyu Song, Hajun Kim, Sehoon Ha, Hae-Won Park

人形机器人足式 / 四足机器人机器人学习控制与动力学
摘要

Generating dynamic motions for legged robots remains a challenging problem. While reinforcement learning has achieved notable success in various legged locomotion tasks, producing highly dynamic behaviors often requires extensive reward tuning or high-quality demonstrations. Leveraging reduced-order models can help mitigate these challenges. However, the model discrepancy poses a significant challenge when transferring policies to full-body dynamics environments. In this work, we introduce a continuation-based learning framework that combines simplified model pretraining and model-homotopy-inspired transfer to efficiently generate and refine complex dynamic behaviors. First, we pretrain the policy using a single rigid body model to capture core motion patterns in a simplified environment. Next, we employ a continuation strategy to progressively transfer the policy to the full-body environment, minimizing performance loss. To define the continuation path, we introduce a parametric transition path from the single rigid body model to the full-body model by gradually redistributing mass and inertia between the trunk and legs. The proposed method achieves faster convergence and demonstrates superior stability during the transfer process compared to baseline methods. Our framework is validated on a range of dynamic tasks, including flips and wall-assisted maneuvers, and is successfully deployed on a real quadrupedal robot.

RA-L 2026-05-11

Robust Manipulation of Deformable Linear Objects

Jimmy Envall, Stelian Coros

操作与机械臂导航 / SLAM / 自动驾驶控制与动力学
摘要

Robotic manipulation is a fundamental challenge in the pursuit of automating household tasks and advancing robotic manufacturing. Manipulation planning for deformable objects is particularly challenging due to the associated large action spaces and complex dynamics. For deformable objects, some actions are sensitive to noise in the model, which can significantly degrade the accuracy of predicted trajectories. In this letter, we present a motion planning algorithm for arranging deformable linear objects (DLOs) into user-provided goal configurations. Using a novel problem formulation as well as a combination of sampling and gradient based optimization, the algorithm finds sequences of grasps and control inputs that are robust. We demonstrate the effectiveness of the new algorithm using numerical experiments and manipulation examples both in simulation and in real world environments.

RA-L 2026-05-11

HiPAN: Hierarchical Posture-Adaptive Navigation for Quadruped Robots in Unstructured 3D Environments

Jeil Jeong, Minsung Yoon, Seokryun Choi, Heechan Shin, Taegeun Yang, Sung-eui Yoon

足式 / 四足机器人导航 / SLAM / 自动驾驶感知与传感
摘要

Navigating quadruped robots in unstructured 3D environments poses significant challenges, requiring goal-directed motion, effective exploration to escape from local minima, and posture adaptation to traverse narrow, height-constrained spaces. Conventional approaches employ a sequential mapping-planning pipeline but suffer from accumulated perception errors and high computational overhead, restricting their applicability on resource-constrained platforms. To address these challenges, we propose Hierarchical Posture-Adaptive Navigation (HiPAN), a framework that operates directly on onboard depth images at deployment. HiPAN adopts a hierarchical design: a high-level policy generates strategic navigation commands—planar velocity and body posture—which are executed by a low-level, posture-adaptive locomotion controller. To mitigate myopic behaviors and facilitate long-horizon navigation, we introduce Path-Guided Curriculum Learning, which progressively extends the navigation horizon from reactive obstacle avoidance to strategic navigation. In simulation, HiPAN achieves higher navigation success rates and greater path efficiency than classical reactive planners and end-to-end baselines, while real-world experiments further validate its applicability across diverse, unstructured 3D environments.

RA-L 2026-05-11

Lighting-Robust Behavioral Cloning via Bayesian Visual Encoding for Robot Manipulation

Zhenyuan Dong, Dongxu Han, Zhi Wang, Zhiyi Yan, Qian Lin, Ziyi Zhang, et al.

操作与机械臂机器人学习感知与传感
摘要

Behavioral cloning (BC) has demonstrated strong performance in robot manipulation by learning visuomotor policies directly from expert demonstrations. However, under illumination-induced distribution shifts, vision-based BC policies often become sensitive to spurious visual variations, leading to degraded reliability and generalization. To address this issue, we propose a lighting-robust BC framework that incorporates Bayesian deep learning into the visual encoder. Specifically, we model a posterior distribution over convolutional weights and optimize the evidence lower bound, where a Kullback–Leibler regularization term constrains feature learning under illumination variability. In simulation, the proposed method achieves higher success rates and shorter rollout lengths than deterministic baselines, while maintaining strong sample efficiency with a single Monte Carlo sample. In real-world experiments across seven illumination regimes, including distribution-shifted conditions, our method consistently outperforms BC baselines and remains stable under severe lighting changes. These results indicate that Bayesian regularization provides an effective inductive bias for learning visual representations that improve robustness and generalization in vision-based robot manipulation.

RA-L 2026-05-11

Task-Adaptive Admittance Control for Human-Quadrotor Cooperative Load Transportation With Dynamic Cable-Length Regulation

Shuai Li, Ton T. H. Duong, Damiano Zanotto

无人机 / 空中机器人操作与机械臂多机器人 / 集群人机交互 / 遥操作控制与动力学
摘要

The collaboration between humans and robots is critical in many robotic applications, especially in those requiring physical human-robot interaction (pHRI). Previous research in pHRI has largely focused on robotic manipulators, employing impedance or admittance control to maintain operational safety. Conversely, research in human-quadrotor cooperative load transportation (CLT) is still in its infancy. This letter introduces a novel admittance controller designed for safe and effective human-quadrotor CLT using a quadrotor equipped with an actively-controlled winch. The proposed method accounts for the system's coupled dynamics, allowing the quadrotor and its cable to dynamically adapt to contact forces during CLT tasks, thereby enhancing responsiveness. We experimentally validated the task-adaptive capability of the controller across the entire CLT process, including in-place loading/unloading and load transporting tasks. To this end, we compared the system performances against a conventional approach, using both variable and fixed cable lengths under low- and high-stiffness conditions. Results demonstrate that the proposed method outperforms the conventional approach in terms of system responsiveness and motion smoothness, leading to improved CLT capabilities.

RA-L 2026-05-11

PolyMerge: Compressing 3D Gaussian Splats With Polytope Coverings for Provably Safe Resource-Constrained Navigation

Jihoon Hong, Chih-Yuan Chiu, Sara Fridovich-Keil, Glen Chou

无人机 / 空中机器人导航 / SLAM / 自动驾驶感知与传感
摘要

Obstacle avoidance is essential for safe navigation and motion planning. Recent radiance field reconstruction methods enable object detection and modeling with high fidelity, but remain too memory- and compute-intensive for on-board perception-based path planning. To address these limitations, we propose PolyMerge to convert a large, photorealistic 3D Gaussian Splatting (3DGS) model of a scene into a lightweight representation of convex polytopes whose union provably over-approximates all obstacles in the original 3DGS model. PolyMerge tunes the polytope count to trade off conservativeness and compute cost, and integrates with control barrier functions (CBFs) to plan collision-free paths. We showcase PolyMerge in simulation and hardware experiments on a Crazyflie drone, which uses PolyMerge to compute and follow safe trajectories in real time under severe onboard compute constraints, outperforming baselines in speed while guaranteeing safety. For our code and videos, visit https://athlon76.github.io/PolyMerge-website/ .

RA-L 2026-05-11

X-IONet: Cross-Platform Inertial Odometry Network for Pedestrian and Legged Robot

Dehan Shen, Changhao Chen

足式 / 四足机器人导航 / SLAM / 自动驾驶机器人学习
摘要

Learning-based inertial odometry has achieved remarkable progress in pedestrian navigation. However, extending these methods to quadruped robots remains challenging due to their distinct and highly dynamic motion patterns. Models that perform well on pedestrian data often experience severe degradation when deployed on legged platforms. To tackle this challenge, we introduce X-IONet, a cross-platform inertial odometry framework that operates solely using a single Inertial Measurement Unit (IMU). X-IONet incorporates a rule-based expert selection module to classify motion platforms and route IMU sequences to platform-specific expert networks. The displacement prediction network features a dual-stage attention architecture that jointly models long-range temporal dependencies and inter-axis correlations, enabling accurate motion representation. It outputs both displacement and associated uncertainty, which are further fused through an Extended Kalman Filter (EKF) for robust state estimation. Extensive experiments on the public RoNIN pedestrian dataset, the GrandTour quadruped dataset, and a self-collected Go2 quadruped dataset demonstrate that X-IONet achieves state-of-the-art performance, reducing ATE and RTE by 14.3% and 11.4% on RoNIN, 11.8% and 9.7% on GrandTour, and 52.8% and 41.3% on Go2. These results highlight X-IONet's effectiveness for accurate and robust inertial navigation across both human and legged robot platforms.

RA-L 2026-05-11

Push Smarter, Not Harder: Hierarchical RL-Diffusion Policy for Efficient Nonprehensile Manipulation

Steven Caro, Stephen L. Smith

操作与机械臂机器人学习控制与动力学
摘要

Nonprehensile manipulation, such as pushing objects across cluttered environments, presents a challenging control problem due to complex contact dynamics and long-horizon planning requirements. In this work, we propose HeRD, a hierarchical reinforcement learning-diffusion policy that decomposes pushing tasks into two levels: high-level goal selection and low-level trajectory generation. We employ a high-level reinforcement learning (RL) agent to select intermediate spatial goals, and a low-level goal-conditioned diffusion model to generate feasible, efficient trajectories to reach them. This architecture combines the long-term reward maximizing behaviour of RL with the generative capabilities of diffusion models. We evaluate our method in a 2D simulation environment and show that it outperforms the state-of-the-art baseline in success rate, path efficiency, and generalization across multiple environment configurations. Our results suggest that hierarchical control with generative low-level planning is a promising direction for scalable, goal-directed nonprehensile manipulation. Code, documentation, and trained models are available: https://github.com/carosteven/HeRD .

RA-L 2026-05-11

Remote Magnetic Levitation Using Reduced Attitude Control and Parametric Field Models

Neelaksh Singh, Jasan Zughaibi, Denis von Arx, Bradley J. Nelson, Michael Muehlebach

导航 / SLAM / 自动驾驶医疗 / 软体 / 微纳控制与动力学
摘要

Electromagnetic navigation systems (eMNS) are increasingly used in minimally invasive procedures such as endovascular interventions and targeted drug delivery due to their ability to generate fast and precise magnetic fields. In this paper, we utilize the OctoMag and a custom 13-coil eMNS to achieve remote levitation and control of multiple rigid bodies across large air gaps, showcasing the dynamic capabilities of such systems. A compact parametric analytical model maps coil currents to the forces and torques acting on the levitating object, eliminating the need for computationally expensive simulations or lookup tables and establishing a levitator- and platform-agnostic control framework. Translational motion is stabilized using linear quadratic regulators. A nonlinear time-invariant controller is used to regulate the reduced attitude accounting for the inherent uncontrollability of rotations about the dipole axis and stabilizing the full five degrees of freedom controllable pose subspace. We analyze key design limitations and evaluate the approach through trajectory tracking experiments across different objects and actuation platforms. Notably, our proposed controller demonstrates superiority over an equivalent baseline PID formulation, reliably tracking large spatial angles up to 65 $^\circ$ . This work demonstrates the dynamic capabilities and potential of feedback control in electromagnetic navigation, which is likely to open up new medical applications.

RA-L 2026-05-11

HALO: Language-Conditioned Overhead Monocular Aerial Exploration and Navigation

Yuezhan Tao, Dexter Ong, Fernando Cladera, Jason Hughes, Camillo J. Taylor, Pratik Chaudhari, et al.

无人机 / 空中机器人导航 / SLAM / 自动驾驶感知与传感
摘要

We demonstrate real-time overhead aerial metric-semantic mapping and exploration using a monocular camera paired with a global positioning system (GPS). Our system, named HALO, addresses two key challenges: (i) real-time dense 3D reconstruction using vision at large distances, and (ii) mapping and exploration of large-scale outdoor environments with accurate scene geometry and semantics. We demonstrate that HALO can plan informative paths that exploit this information to complete missions with multiple tasks specified in natural language. In simulation-based evaluation across large-scale environments of size up to 78,000 sq. m., HALO consistently completes tasks with less exploration time and achieves up to 68% higher competitive ratio in terms of the distance traveled compared to the state-of-the-art semantic exploration baseline. We use real-world experiments on a custom quadrotor platform to demonstrate that (i) all modules can run onboard the robot, and that (ii) in diverse environments HALO can support effective autonomous execution of missions covering up to 24,600 sq. m. area at an altitude of 40 m.

RA-L 2026-05-11

FlickerTac: Flickering LED Driven Photometric Stereo for Event Vision-Based Tactile Sensors

Mohamad Halwani, Akram Khairi, Hussain Sajwani, Eslam Sherif, Laith AbuAssi, Sajid Javed, et al.

操作与机械臂机器人学习感知与传感
摘要

Tactile sensing at high speed and resolution is critical for robotic perception and control. Existing vision-based tactile sensors (VBTSs) achieve high spatial accuracy but suffer from degraded performance in dynamic interactions due to motion blur and low frame rates, limiting their effectiveness in fast manipulation. Event-based cameras, with their high-temporal-resolution output, offer a promising alternative—but their inability to capture static scenes and absolute intensities has hindered their use for dense 3D reconstruction even under static contact conditions. We introduce FlickerTac, the first Event Vision-Based Tactile Sensor (EVBTS) capable of performing photometric stereo-based dense 3D reconstruction. FlickerTac replaces conventional 3-channel RGB illumination with time-multiplexed, phase-shifted flickering LEDs to encode directional shading into temporally separated channels. A lightweight neural network then maps these spatio-temporal representations directly to surface gradients, which are integrated to recover dense depth maps. Our design enables high-speed, accurate tactile 3D geometry estimation. Extensive experiments demonstrate that FlickerTac achieves millimeter-scale depth accuracy competitive with state-of-the-art VBTSs, while operating at 125 Hz—unmatched by existing frame-based VBTSs.

RA-L 2026-05-11

HOTFLoc++: End-to-End Hierarchical LiDAR Place Recognition, Re-Ranking, and 6-DoF Metric Localisation in Forests

Ethan Griffiths, Maryam Haghighat, Simon Denman, Clinton Fookes, Milad Ramezani

无人机 / 空中机器人机器人学习感知与传感
摘要

This article presents HOTFLoc++, an end-to-end hierarchical framework for LiDAR place recognition, re-ranking, and 6-DoF metric localisation in forests. Leveraging an octree-based transformer, our approach extracts features at multiple granularities to increase robustness to clutter, self-similarity, and viewpoint changes in challenging scenarios, including ground-to-ground and ground-to-aerial in forest and urban environments. We propose learnable multi-scale geometric verification to reduce re-ranking failures due to degraded single-scale correspondences. Our joint training protocol enforces multi-scale geometric consistency of the octree hierarchy via joint optimisation of place recognition with re-ranking and localisation, improving place recognition convergence. Our system achieves comparable or lower localisation errors to baselines, with runtime improvements of almost two orders of magnitude over RANSAC-based registration for dense point clouds. Experimental results on public datasets show the superiority of our approach compared to state-of-the-art methods, achieving an average Recall@1 of 90.7% on CS-Wild-Places: an improvement of 29.6 percentage points over baselines, while maintaining high performance on single-source benchmarks with an average Recall@1 of 91.7% and 97.9% on Wild-Places and MulRan, respectively. Our method achieves under 2m and 5$^{\circ}$ error for 97.2% of 6-DoF registration attempts, with our multi-scale re-ranking module reducing localisation errors by ~2x on average. The code is available at https://github.com/csiro-robotics/HOTFLoc.

RA-L 2026-05-11

GIL-3D: U-Shaped Diffusion Transformers for Generalizable 3D Imitation Learning

Xiyue Wang, Aoran Mei, Linzhi Wu, Zhongxue Gan, Guo-Niu Zhu

操作与机械臂机器人学习感知与传感控制与动力学
摘要

Imitation learning with 3D vision effectively alleviates the impact of variations in lighting, background, and texture. It exhibits superior robustness compared to 2D-based methods. However, existing 3D imitation learning methods often suffer from performance degradation as the task horizon increases, primarily due to insufficient modeling of temporal dependencies and misalignment between states and actions. To address these challenges, we propose GIL-3D, a novel framework that combines the stable training dynamics of diffusion models with the global temporal modeling capability of Transformers. To further enhance multi-scale temporal dependency modeling, we explore alternative U-shaped hierarchical architectures and introduce strided skip connections to reduce redundancy in dense feature fusion. Moreover, we present a full-sequence joint attention mechanism to strengthen cross-modal interactions and improve the consistency between visual perception and action generation. Extensive experiments demonstrate that our model consistently outperforms existing baselines, achieving a 16.1% improvement in success rate on simulated benchmarks and a 15.83% improvement on real-world manipulation tasks. In addition, comprehensive generalization studies show that GIL-3D maintains robust performance in previously unseen scenarios.

RA-L 2026-05-11

Disturbance-Robust Dynamical System Learning With Neural ODEs and Flow-Matching Augmentation

Bang Liu, Pingyun Nie, Zhuang Fu, Tianxiang Jiang, Zi Fang, Jianfeng Yao, et al.

导航 / SLAM / 自动驾驶机器人学习控制与动力学
摘要

Autonomous dynamical systems (DS) are essential for imitation learning but often face challenges in simultaneously achieving high accuracy, stability guarantees, and resistance to disturbances. To overcome these limitations, this paper proposes a globally stable DS with trajectory attraction and disturbance robustness, termed K-Canyon DS (KCDS). The method first employs a Neural ODE to learn a diffeomorphic mapping from the task space to a linear latent space, thereby enhancing learning accuracy while ensuring global stability. Subsequently, based on clustering of starting points in the linear latent space, a nonlinear learnable trajectory-attractive field with a canyon-like structure is introduced to confer disturbance robustness. System parameters are trained via flow matching-based data augmentation, ultimately achieving a globally stable and disturbance-robust DS design. The proposed method is thoroughly evaluated on 2D and 3D LASA datasets, including comparative experiments with open-source baselines under disturbance. Real-robot experiments further demonstrate its practical applicability. The codebase is available at https://github.com/bbg-sjtu/KCDS.git.

RA-L 2026-05-11

Prompt2Craft: Generating Functional Craft Assemblies With LLMs

Vitor Hideyo Isume, Takuya Kiyokawa, Natsuki Yamanobe, Yukiyasu Domae, Weiwei Wan, Kensuke Harada

操作与机械臂机器人学习感知与传感
摘要

Inspired by traditional handmade crafts, where a person improvises assemblies based on the available objects, we formally introduce the Craft Assembly Task. It is a robotic assembly task that involves building an accurate representation of a given target object using the available objects, which do not directly correspond to its parts. In this work, we focus on selecting the subset of available objects for the final craft, when the given input is an RGB image of the target in the wild. We use a mask segmentation neural network to identify visible parts, followed by retrieving labeled template meshes. These meshes undergo pose optimization to determine the most suitable template. Then, we propose to simplify the parts of the transformed template mesh to primitive shapes like cuboids or cylinders. Finally, we design a search algorithm to find correspondences in the scene based on local and global proportions. We develop baselines for comparison that consider all possible combinations, and choose the highest scoring combination for common metrics used in foreground maps and mask accuracy. Our approach achieves comparable results to the baselines for two different scenes, and we show qualitative results for an implementation in a real-world scenario.

RA-L 2026-05-11

A Model-Based Visual Contact Localization and Force Sensing System for Compliant Robotic Grippers

Kaiwen Zuo, Shuyuan Yang, Zonghe Chua

操作与机械臂导航 / SLAM / 自动驾驶机器人学习感知与传感
摘要

Grasp force estimation can help prevent robots from damaging delicate objects during manipulation and improve learning-based robotic control. Integrating force sensing into deformable grippers negotiates trade-offs in cost, complexity, mechanical robustness, and performance. With the growing integration of RGB-D wrist cameras into robotic systems for control purposes, camera-based techniques are a promising solution for indirect visual force estimation. Current approaches mostly utilize end-to-end deep learning, which can be brittle when generalizing to new scenarios, while existing model-based approaches are unsuited to grasping and modern grasper geometries. To address these challenges, we developed a model-based visual force sensing approach integrating an iterative contact localization with generalization to unseen objects. The system extracts structural key points from wrist camera RGB-D images of deforming finray- shaped soft grippers, and uses these key points to define parameters of an inverse finite element analysis simulation in Simulation Open Framework Architecture. The iterative contact localization sub-system utilizes a deep learning-based online 3D reconstruction and pose estimation pipeline to dynamically update contact location, and is robust to visual occlusion and unseen objects. Our system demonstrated an average root mean square error of 0.23N and normalized root mean square deviation of 2.11% during the load phase, and 0.48N and 4.34% over the entire grasping process when interacting with different objects under various conditions, showcasing its potential for real-time model-based indirect force sensing of soft grippers.

RA-L 2026-05-11

BEV-OSP: Obstacle State Prediction in Bird’s-Eye View to Enable Obstacle Avoidance and Navigation in Dynamic Environments

Zejie Jiang, Jinyang Lai, Yunlong Liu

导航 / SLAM / 自动驾驶机器人学习感知与传感控制与动力学
摘要

Despite the prevalence of deep reinforcement learning (DRL) for navigation in dynamic environments, existing end-to-end DRL methods still struggle to effectively balance global planning with local obstacle avoidance and to perceive the dynamics of obstacles. Meanwhile, external detection methods are also often susceptible to noise and limited in accuracy. To address this, we propose an efficient perception framework, Obstacle State Prediction in Bird's-Eye View (BEV-OSP), which generates an intermediate Bird's-Eye View (BEV) feature by fusing sensor data with robot velocity. This representation filters out irrelevant sensory noise and static background details without relying on any third-party detection module. We further integrate a self-supervised auxiliary task based on Bootstrap Your Own Latent (BYOL) contrastive learning into the DRL framework to predict obstacle states in latent space, which boosts the encoder's perception of environmental dynamics. Finally, to ensure continuous navigation, a subgoal prediction module that can generate waypoints when the global planner is unavailable is introduced. Extensive experiments including successful real-world deployments in various scenarios demonstrate that BEV-OSP outperforms state-of-the-art (SOTA) methods in evaluations across multiple navigation metrics.

RA-L 2026-03-30 · 被引 1

PRIX: Learning to Plan From Raw Pixels for End-to-End Autonomous Driving

Maciej Wozniak, Lianhang Liu, Yixi Cai, Patric Jensfelt

导航 / SLAM / 自动驾驶机器人学习感知与传感
摘要

While end-to-end autonomous driving models show promising results, their practical deployment is often hindered by large model sizes, a reliance on expensive LiDAR sensors and computationally intensive BEV feature representations. This limits their scalability, especially for mass-market vehicles equipped only with cameras. To address these challenges, we propose PRIX ( P lan from R aw p IX els). Our novel and efficient end-to-end driving architecture operates using only camera data, without explicit BEV representation and forgoing the need for LiDAR. PRIX leverages a visual feature extractor coupled with a generative planning head to predict safe trajectories from raw pixel inputs directly. A core component of our architecture is the Context-aware Recalibration Transformer (CaRT), a novel module designed to effectively enhance multi-level visual features for more robust planning. PRIX achieves SOTA performance on the NavSim-v2 and nuScenes datasets. On NavSim-v1, it also outperforms the majority of multimodal planners and other camera-only approaches on most metrics. Critically, PRIX is significantly more efficient on NavSim-v1, boasting faster inference speeds and a smaller model size. This combination of performance and efficiency makes it a practical solution for real-world deployment. Our work is open-source and the code will be available upon publication. Check our project website for more https://maxiuw.github.io/prix .

JFR 2026-06-19

Outracing a National Level Model Racing Car Champion: A Hybrid Model‐Based Data‐Driven Approach

Mustafa Alp, Matteo Corno, Giulio Panzani, Sergio Matteo Savaresi

摘要

This paper discusses lap time optimization, focusing on a single lap without considering opponents in autonomous racing. The paper presents a control and optimization architecture composed of a model‐based low level controller and a higher level iterative learning algorithm with the goal of obtaining the fastest qualifying lap in autonomous racing competitions. First principles models are extremely expensive to calibrate near the handling limit, to solve this issue our algorithm learns the position varying acceleration limits of the vehicle over multiple laps. The proposed algorithm brings together the robustness and generalization capability of model‐based approaches with the performance of data‐driven methods. To validate the approach and its computational efficiency, we implement the solution on a high performance small scale vehicle and test it against a human driver on a racing track with speed up to 50 km/h and lateral accelerations of 1.2 g. The proposed approach beats a national level champion in terms of qualifying lap for small scale vehicles, on the considered test track.

JFR 2026-06-01

Performance Evaluation of Different Laser SLAM Algorithms for Unmanned Mining Vehicles

Jiangdong Wu, Tengfei Zhang, Qun Chao, Chengliang Liu

导航 / SLAM / 自动驾驶感知与传感控制与动力学
摘要

Unmanned mining technology is essential for enhancing safety, increasing efficiency, and reducing operational costs. The complex and hazardous nature of mining environments demands advanced positioning systems for autonomous vehicles, with laser simultaneous localization and mapping (SLAM) algorithms playing a critical role. This paper provides a systematic review of the core technical modules within laser SLAM algorithms, analyzing their development trends, strengths, and weaknesses. A comprehensive evaluation of fifteen mainstream SLAM algorithms on the AutoMine open‐pit mining data set reveals significant insights. Experimental results demonstrate that traditional feature‐based algorithms are prone to significant trajectory drift due to feature loss in sparse‐feature mining environments. Notably, the study further identifies a specific “Ramp Drift” mechanism where recursive estimators suffer Z ‐axis instability on monotonic slopes. Comparative analysis suggests that while Light Detection and Ranging (LiDAR)–inertial fusion generally enhances robustness, degeneracy‐aware architectures are the decisive factor for stability. Specifically, the LiDAR‐only odometry GLO achieves the highest stability in relative pose error due to its Weighted Elastic Matching strategy, while the Adaptive‐LIO demonstrates superior global consistency in absolute pose error. This study highlights a current lack of SLAM architectures specifically optimized for the unique challenges of open‐pit mines. Future research should focus on feature extraction enhancement in open scenes, the development of optimized LiDAR–inertial–RTK fusion architectures, and the integration of artificial intelligence to improve adaptability in dynamic and degraded scenarios.

T-RO 2026-04-21

Stratified Topological Autonomy for Long-Range Coordination (STALC)

Cora A. Duggan, Adam Goertz, Adam Polevoy, Mark Gonzales, Kevin C. Wolfe, Bradley Woosley, et al.

感知与传感多机器人 / 集群
摘要

In this paper, we present STALC, a hierarchical planning approach for multi-robot coordination in real-world environments with significant inter-robot spatial and temporal dependencies. At its core, STALC consists of a multi-robot graph-based planner which combines a topological graph with a novel, computationally efficient mixed-integer programming formulation to generate highly-coupled multi-robot plans in seconds. To enable autonomous planning across different spatial and temporal scales, we construct our graphs so that they capture connectivity between free-space regions and other problem-specific features, such as traversability or risk. We then use receding-horizon planners to achieve local collision avoidance and formation control. To evaluate our approach, we consider a multi-robot reconnaissance scenario where robots must autonomously coordinate to navigate through an environment while minimizing the risk of detection by observers. Through simulation-based experiments, we show that our approach is able to scale to address complex multi-robot planning scenarios. Through hardware experiments, we demonstrate our ability to generate graphs from real-world data and successfully plan across the entire hierarchy to achieve shared objectives.

T-RO 2026-04-21

Fast and Robust Online Initialization of Monocular Visual-Inertial Odometry via One-Dimensional Cost Approximation

Jiseock Kang, Jaeu Choe, Doyoon Kong, Byoungkwon Yoon, Jaehwi Cho, Dongjun Lee

导航 / SLAM / 自动驾驶感知与传感
摘要

In this article, we present a fast and accurate initialization algorithm of monocular visual-inertial odometry (MVIO). Our algorithm enables MVIO initialization within under-second motion time (0.45s for Q1 and 0.7s for Q2) without divergence for 1500 random runs in the EuroC-MAV and TUM-VI dataset. Our algorithm is based on almost-exact one-dimensional (1D) approximation of the loosely coupled MVIO (LC-MVIO) problem, which is obtained by approximately decoupling optimization of vision rotation error from the original LC-MVIO optimization. Thanks to the efficiency of 1D domain exploration, this approximated LC-MVIO is feasible without the initial guess, therefore can be used as an initializer of the original MVIO. Based on this 1D formulation, we design our initializer with exhaustive correctness guarantee, a novel estimator guarantee we propose in this work, that certifies the absence of multiple, ambiguous solution modes with similar likelihood cost. Our algorithm is validated through Monte-Carlo simulation, real-world dataset experiment, and real-vehicle experiment. An open-source MVIO pipeline with the suggested initialization algorithm is also provided as a supplement. Source code: https://github.com/cyclops-double-blind/cyclops .

T-RO 2026-04-21

Semi-Infinite Programming for Collision-Avoidance in Optimal and Model Predictive Control

Yunfan Gao, Florian Messerer, Niels van Duijkeren, Rashmi Dabir, Moritz Diehl

导航 / SLAM / 自动驾驶控制与动力学
摘要

This paper presents a novel approach for collision avoidance in optimal and model predictive control, in which the environment is represented by a large number of points and the robot as a union of padded polygons. The conditions that none of the points shall collide with the robot can be written in terms of an infinite number of constraints per obstacle point. We show that the resulting semi-infinite programming (SIP) optimal control problem (OCP) can be efficiently tackled through a combination of two methods: local reduction and an external active-set method. Specifically, this involves iteratively identifying the closest point obstacles, determining the lower-level distance minimizer among all feasible robot shape parameters, and solving the upper-level finitely-constrained subproblems. In addition, this paper addresses robust collision avoidance in the presence of ellipsoidal state uncertainties. Enforcing constraint satisfaction over all possible uncertainty realizations extends the dimension of constraint infiniteness. The infinitely many constraints arising from translational uncertainty are handled by local reduction together with the robot shape parameterization, while rotational uncertainty is addressed via a backoff reformulation. A controller implemented based on the proposed method is demonstrated on a real-world robot running at 20 Hz, enabling fast and collision-free navigation in tight spaces. An application to 3D collision avoidance is also demonstrated in simulation.

T-RO 2026-04-21

A Baseline Torque Controller Synchronized With Adaptive Oscillators Improves Transparency of a Six DoF Lower Limb Exoskeleton

Rafhael M. Andrade, Benito L. Pugliese, Abolfazl Mohebbi, Paolo Bonato

足式 / 四足机器人医疗 / 软体 / 微纳
摘要

Lower-limb exoskeletons (LLE) have proven to be effective tools in rehabilitation for diminishing gait impairments. However, modern LLE control systems based on the assistance-as needed concept face challenges in providing appropriate interaction torques to the user. This is especially true when it is necessary to make the exoskeleton transparent, i.e. with zero user robot interaction torques (τUR), even during walking speed variations, when no assistance is required. To address these limitations, in this work we designed a novel transparent control system that utilizes a baseline torque controller, which generates a torque profile based on τUR from previous strides, and drives the actuators, leading to a lower τUR, along with a synchronization layer comprising a pool of adaptive oscillators (AOs) to estimate the user's gait phase and adapt the controller to different walking speeds. Additionally, a zero-torque controller was used in parallel to the main torque controller to enhance users' volitional control and balance. The proposed controller was experimentally tested on a six-DoF LLE with eight healthy participants who walked for 200 gait strides at three walking speeds. The baseline torque controller was shown to adapt to different walking speeds and successfully improve the exoskeleton transparency in terms of user-robot interaction, reducing the average τUR by 40% of the entire exoskeleton and 70% of the hip joint at 0.8 m/s, and gait kinematics compared to our prior controller based on a zero impedance model.

T-RO 2026-04-21

Scalable Unseen Objects 6-DoF Absolute Pose Estimation With Robotic Integration

Jian Liu, Wei Sun, Kai Zeng, Jin Zheng, Hui Yang, Hossein Rahmani, et al.

操作与机械臂感知与传感
摘要

Pose estimation-guided unseen object 6-DoF robotic manipulation is a key task in robotics. However, the scalability of current pose estimation methods to unseen objects remains a fundamental challenge, as they generally rely on CAD models or dense reference views of unseen objects, which are difficult to acquire, ultimately limit their scalability. In this paper, we introduce a novel task setup, referred to as SinRef-6D, which addresses 6-DoF absolute pose estimation for unseen objects using only a single pose-labeled reference RGB-D image captured during robotic manipulation. This setup is more scalable yet technically nontrivial due to large pose discrepancies and the limited geometric and spatial information contained in a single view. To address these issues, our key idea is to iteratively establish point-wise alignment in a common coordinate system with state space models (SSMs) as backbones. Specifically, to handle large pose discrepancies, we introduce an iterative object-space point-wise alignment strategy. Then, Point and RGB SSMs are proposed to capture long-range spatial dependencies from a single view, offering superior spatial modeling capability with linear complexity. Once pre-trained on synthetic data, SinRef-6D can estimate the 6-DoF absolute pose of an unseen object using only a single reference view. With the estimated pose, we further develop a hardware-software robotic system and integrate the proposed SinRef-6D into it in real-world settings. Extensive experiments on six benchmarks and in diverse real-world scenarios demonstrate that our SinRef-6D offers superior scalability. Additional robotic grasping experiments further validate the effectiveness of the developed robotic system.

JFR 2026-06-18

Heavy‐UUV Docking System for a Fixed Seabed Station Based on Differential Optical‐Guidance Beacons

Kai Sun, Yiyang Li, Zekai Han, Jichao Lang, Xiaojun Han

摘要

Heavy unmanned underwater vehicles (UUVs) with high‐capacity batteries and various payloads demand advanced energy replenishment and data retrieval technologies. Conventional suspended docking stations are unable to effectively counteract the impact forces of heavy UUVs, making seabed docking a more feasible solution. This study presents a seabed docking system for heavy UUVs that utilizes innovative differentiated optical‐guidance beacons, significantly enhancing the effective docking range to 35 m. Through the study of underwater optical characteristics, an effective‐workspace index for underwater optical guidance was proposed, guiding the design of differential beacons. Upon successful docking, the system enables high‐power magnetic‐coupling wireless power transfer (MC‐WPT) and high‐speed underwater wireless optical communication (UWOC). The MC‐WPT system operates at a power level of 6.6 kW, representing a high‐power level for underwater wireless charging and achieving an efficiency of 88.6%. A maximum data rate of 27 Mbps was attained by the UWOC system by employing a “wide‐angle transmission and wide‐field‐of‐view reception” technique. Validation tests conducted at a depth of 60 m in Qiandao Lake, using a UUV with a diameter of 533 mm, demonstrated a 100% successful docking rate. The results validate the feasibility and robustness of the proposed docking system in real‐world underwater conditions.

RA-L 2026-05-15

Design and Control of Centimeter-Scale Reconfigurable Aquatic Modular Robots

Kevin Macauley, Michael Zhaoy, Zhiheng Cheny, Wei Wang

多机器人 / 集群控制与动力学
摘要

This paper presents the Centimeter-scale Autonomous Reconfigurable Platform (CARP), a low-cost, self-assembling Autonomous Surface Vehicle (ASV) designed for modular and cooperative maritime operations. CARP features a 90 mm square top section with four independent active latching mechanisms, enabling dynamic connection and disconnection between units. The vehicle's motion is controlled using a Linear Quadratic Regulator (LQR)-based optimal controller with weak formulation-based dynamic parameter estimation for improved performance and adaptability. We demonstrate CARP's capabilities through trajectory tracking experiments and multi-robot reconfiguration into collective structures. CARP's compact, low-cost, and modular design enables rapid fabrication and deployment of multiple vehicles, making it an effective platform for developing and testing control, reconfiguration, and swarm robotics algorithms.

RA-L 2026-05-15

CPP: Cooperative Priority Planning for Multi-Agent Path Finding With Coordination and Action Durations

Zhengchen Li, Boyu Li, Weimin Wu, Dacheng Li

导航 / SLAM / 自动驾驶多机器人 / 集群
摘要

Modern multi-agent systems are evolving toward scalability, heterogeneity, and collaboration, where agents are required to coordinate to perform shared tasks. However, existing Cooperative Multi-Agent Path Finding (Co-MAPF) research typically assumes instantaneous actions. In reality, individual and joint actions require non-negligible durations, which introduce additional state dimensions that significantly expand the conflict-resolution search space. This letter formulates the problem as Co-MAPF with Action Durations and proposes Cooperative Priority Planning (CPP). CPP maintains high-level constraints and a priority queue to serialize task allocation and optimize meeting vertex selection, while employing a low-level solver that integrates synchronization windows with Time-Constrained A* to couple safe action durations with conflict-free path planning. Experimental results demonstrate that CPP achieves near-linear empirical time complexity and superior scalability. It exhibits near-optimal path quality with a small optimality gap in small-scale scenarios, while scaling more effectively than existing Co-MAPF solvers to constrained scenarios. Moreover, it maintains high success rates in large-scale, constrained scenarios, and ablation studies validate the effectiveness of each component.

RA-L 2026-05-15

Open-Vocabulary Semantic Segmentation for Dynamic 3D Scenes Using Scene Flow Estimation

You-Jun Li, Yu-Kai Lin, Bang-Shien Chen, Chih-Wei Huang, Jann-Long Chern, Ching-Cherng Sun

导航 / SLAM / 自动驾驶感知与传感
摘要

3D open-vocabulary semantic segmentation has shown great potential in applications such as autonomous driving and mixed reality. However, achieving accurate segmentation in dynamic environments remains challenging due to motion-induced inconsistencies. To address this issue, we incorporate scene flow as temporal information into a static semantic backbone to enhance semantic consistency and accuracy over time. Our method captures inter-frame motion cues from point cloud sequences and leverages them, together with a local clustering mechanism, to refine semantic label consistency in consecutive frames. Furthermore, we introduce a two-way scene flow-based data augmentation strategy that exploits both forward and backward motion to jointly train the model in bidirectional temporal contexts. On the large-scale nuScenes autonomous driving dataset, our method achieves a 0.4% overall improvement in hIoU and a 2.37% gain under high-motion scenes. On the synthetic object-centric dataset, it achieves a 4.53% overall hIoU improvement and a 6.09% gain in high-motion scenes, while reducing the ID switch rate by 0.5%.

RA-L 2026-05-15

A Mole-Inspired Scratch-Digging Robot for Granular Media Traversal

Jiabin Liu, Zhaofeng Liang, Haifei Zhu, Yisheng Guan, Kun Xu, Xilun Ding, et al.

足式 / 四足机器人多机器人 / 集群
摘要

This letter proposes a mole-inspired scratch-digging robot to investigate four-limb subsurface locomotion in granular media. The robot integrated a conical head, a rigid torso, a hybrid crank-rocker and crank-slider forelimb that reproduces scratch-digging strokes, and a two-degree-of-freedom (DOF) closed-chain five-bar hindlimb for propulsion. The forelimb was optimized in ADAMS using a sequential quadratic programming algorithm to enlarge the toe-tip excavation envelope, increasing the excavated area by 33.1% and yielding a clearer retraction phase. For the hindlimb, Bézier-parameterized end-point trajectories were adopted, and link lengths were optimized via particle swarm optimization simulation, by which the reachable workspace was expanded by 15.22%. A prototype with an embedded motion controller was built (193 × 96 × 58 mm, 411 g). A traversal distance of 359 mm was achieved in loose granular media, and stable semi-buried progression was demonstrated in a denser lunar-regolith simulant using a sequenced gait.

JFR 2026-06-05

A Low‐Drift Legged Robot State‐Estimation System Through Combined Physics‐Informed Contact Estimation Network and Full Joint State

Zhentao Xie, Qinchuan Li, Xiaolong Zhou, Yabin Xu

足式 / 四足机器人导航 / SLAM / 自动驾驶
摘要

Proprioceptive sensors are crucial for legged robots, as they provide reliable internal state information and are less affected by environmental disturbances. A robust proprioceptive base state estimator is essential for the localization and control capabilities of legged robots. Classical methods for estimation of legged robot state often use IMU integration for prediction and use the assumption of stationary foot contact for updates. However, they suffer from issues like IMU accelerometer noise from foot‐end impacts, nonlinear foot‐ground interactions, and sensor parameter uncertainties, which leads to estimation drift. To address these limitations, this paper proposes a novel system for estimating the low drift state of legged robots by combining contact and joint state estimation. Specifically, our method i) proposes a physics‐informed contact estimation state network to obtain accurate contact states for legged robots, ii) estimates joint states of the legged robot and obtained body accelerations computed from joint accelerations, and iii) updates the base position, orientation, and velocity by gravitational acceleration components and the assumption of static contact points. Under standard operating conditions, experiments on both public and private datasets demonstrate that the proposed method outperforms state‐of‐the‐art algorithms.

JFR 2026-06-05

From Flybys to Sample Return: A Review of Space Probes and Robotic Sampling Technologies for Small Bodies

Xin Zhang, Hao Zhou, Dalin Zhou, Zhaojie Ju

导航 / SLAM / 自动驾驶感知与传感
摘要

As a crucial puzzle piece of deep space exploration, exploring small bodies can provide significant scientific insights and valuable mineral resources. Unlike missions to the Moon and Mars, small‐body missions pose distinct technical challenges, including communication delays, weak gravity, and uncertain environments. This paper reviews a full mission spectrum of probe‐based small‐body missions to analyze the current status and core technologies, tracing from flyby detection to orbital observation, impact detection, in‐situ sampling and analysis, and sample‐return missions. Since sampling missions act as a bridge between detection and exploitation, robotic sampling has become one of the most cutting‐edge and core technologies for space robotic probes. Space robots are the synergy of robotics and autonomy: robotics provides the physical capability for sample acquisition, while autonomy enables the robot to perceive environments, make decisions, and plan actions independently for robust and efficient sampling. Therefore, we investigate the state‐of‐the‐art robotic sampling and onboard autonomy technologies, including robotic sampler design, autonomous touchdown, and compliant interaction. Given the significant uncertainties associated with small bodies, we propose a trade‐off cobweb model to evaluate the comprehensive performance of various robotic samplers. Additionally, we review mainstream microgravity simulation technologies for testing robotic samplers on the ground. Finally, we analyze key lessons learned from past missions and discuss directions for future small‐body exploration. This review aims to serve as a comprehensive and useful reference for researchers in the field.

JFR 2026-06-05

Multi‐Robot Collaborative Navigation Framework Based on 3D Voronoi Partitioning in Uneven and Unstructured Environments

Hongyang Zhao, Kunyu Xu, Nuo Xu, Xingdong Li, Jing Jin

导航 / SLAM / 自动驾驶多机器人 / 集群
摘要

Currently, research on cooperative navigation of multi‐robot systems in uneven terrain environments is still relatively scarce. The complex and rugged terrain features often pose significant threats to the navigation performance of each robot, thereby presenting considerable challenges for conducting safe and efficient collaborative operations in such environments. This article proposes a multi‐robot cooperative navigation framework aimed at addressing these challenges on uneven road surfaces. Based on three‐dimensional Voronoi partitioning, a terrain‐aware partitioning strategy (TAPM‐IVP) is introduced, enabling each robot to perform real‐time terrain analysis, categorizing the space into traversable space, non‐traversable space, and free space, thus facilitating task allocation in complex uneven terrain. Based on the partition results of TAPM‐IVP, a greedy heuristic sub‐goal decision method is proposed, which selects safe, non‐redundant, and collision‐free target points, employing a hierarchical generation strategy to guide the robotic system in efficiently navigating in unknown environments, thereby significantly enhancing navigation adaptability and task completion efficiency in uneven scenarios. Finally, both simulations and real‐world experiments validate the feasibility, safety, and efficiency of the proposed framework for cooperative navigation in uneven road conditions.

RA-L 2026-05-14

A Mid-Air Autonomous Battery Exchange Aerial System

Yooseung Choi, Jamie Henson, Kai Xue Keller, Clifford Gamble, Abhishek Kini, Ran Dai

无人机 / 空中机器人控制与动力学
摘要

This paper presents an autonomous mid-air battery exchange system that sustains UAV operations without ground returns by docking a supplier drone to a receiver drone and swapping batteries in flight. Our approach includes (1) a hierarchical rendezvous planner that selects a mutually reachable three-dimensional docking point and generates a collision-free, dynamically feasible trajectory for both drones, and (2) a docking guidance framework that combines real-time kinematic (RTK)-based relative positioning, offset-compensated tracking, and empirically tuned altitude control to maintain stability during docking. Once docking is achieved, a battery-switching mechanism on the supplier drone uses a geared stepper motor for a belt system to collect the depleted battery and pass the new battery to the receiver drone. Simulations and outdoor flight tests in moderate wind conditions demonstrated effective rendezvous, stable docking, and successful battery exchange, supporting mid-air swapping as a viable approach to extended endurance.

RA-L 2026-05-14

Three Percent is Enough: Semi-Supervised Martian Segmentation Labeling With Active Learning

Muyao Li, Ruyi Zhou, Liang Ding, Huaiguang Yang, Haibo Gao, Zongquan Deng

导航 / SLAM / 自动驾驶感知与传感
摘要

Accurate, large-scale Martian segmentation datasets are a cornerstone of autonomous scene understanding in support of exploration and navigation in Martian environments. However, high-quality segmentation labeling on planetary images requires annotators to have professional extraterrestrial geological knowledge, and even skilled annotators need a long time to label a single image in detail. In this paper, we propose a semi-automatic annotation method for Martian scene segmentation. By integrating a semi-supervised segmentation network architecture with an active learning strategy, our framework achieves near-fully supervised performance using a minimal amount of manual annotations, significantly reducing dependence on human experts. Experiments on the S $^{5}$ Mars dataset show that our framework reaches 77.01% mIoU with only 3.13% of the manual annotations (169 images), corresponding to 92.23% of the performance (83.50% mIoU) of the official fully supervised model trained on 5400 labeled images. We also conducted an extended experiment on AI4Mars, where the proposed framework consistently achieved strong performance, exceeding the fully supervised baseline by 8.12% using only 3.12% of labeled data. This annotation ratio is substantially lower than the nearly 20% labeled data typically required by conventional semi-supervised approaches, highlighting the efficiency of our method for large-scale Martian scene annotation. Our project page is available at https://wmr-team.github.io/CPS-Mars-webpage/ , and the code will be released upon acceptance.

AuRo 2026-06-10

Decentralized Information-driven Approach for Tracking Multiple Moving Targets with Multi-Robot Networks

Junyi Dong, Sushrut Surve, Cong Liu, Kyung Rak Jang, Suming Qiu, Pingping Zhu, et al.

多机器人 / 集群
摘要

Significant advances in robot sensing and mobility have enabled the use of multi-robot networks to track multiple targets moving in an environment. Tracking moving targets that outnumber the size of the robot network involves the network to collaboratively assume a subset of targets to be tracked and plan the actions of the robots. This paper focuses on the scenario when the targets outnumber the robots in the network. The goal of the network is to assign targets to the robots and plan robot actions to track these assigned targets. This has to be done consistently as the robots and targets move in the environment to ensure the tracking performance is maximized. However, this problem, as shown in this paper, is NP-hard. This paper leverages decomposition theory to solve this problem efficiently in real-time in two stages. The first stage leverages inter-robot communication in the network to assign robots to targets, and the second stage, solved on each robot locally, optimizes the control to track the targets assigned to it. A novel decentralized approach, called bundle-based assignment, is presented to find adaptive and conflict-free target assignment in the first stage that guarantees $$\frac{1}{2}$$ 1 2 -approximation in the worst case. Since robots can be assigned more than one target, the second stage optimizing for control is shown to take the form of a multi-objective control problem with conflicting objectives, and a strategy is proposed to solve it for real-time applications. A novel information-gain-based tracking objective is developed, which can be used to solve the two stages, suited specifically for the scenario under consideration. Simulation results show that the novel approach, called bundle-based assignment and control (BBAC), optimizing the novel tracking objective, outperforms existing algorithms and achieves performance very close to that of the optimal solution in a shorter time. Physical experiments with a network of ground robots tracking human targets further validate the applicability of these approaches in the real world.

JFR 2026-06-10

Automated Lawn Maintenance: An Agronomic and Operational Review of Turf Health, Biodiversity, and Field Performance

Andrea Palladini, Lorenzo Guerrini, Marco Fontanelli, Lucia Bortolini

导航 / SLAM / 自动驾驶
摘要

Grass mowing is one of the most resource‐consuming activities in green maintenance, whether in private areas such as home gardens or in public spaces like urban parks. In recent years, concerns related to climate change, human health, and sustainability have become increasingly prominent in green maintenance, leading manufacturers and industry professionals to explore and promote alternatives to traditional management methods. Battery‐powered machines have gained popularity, significantly contributing to the reduction of fuel consumption and localized emissions. Initially, challenges related to energy autonomy and the limited power output of electric drivetrains were prevalent; however, many of these issues have been successfully addressed in recent years. Simultaneously, significant advances in operational autonomy have been made in lawn mowing with the development of robotic lawnmowers (RLMs). These devices have been maintaining gardens worldwide for over 15 years, achieving high levels of reliability and positioning themselves as strong competitors to conventional gasoline‐powered lawnmowers. This review provides a critical synthesis of the current state of RLMs, considering them as field‐deployed robotic systems operating within complex biological and agro‐ecological environments, detailing the interplay between core robotic systems and operational and environmental performance, covering sensor technology, navigation systems, biodiversity impact, sustainability, and energy consumption. The work identifies systemic gaps in the literature regarding consistent comparison metrics and proposes a structured, multi‐dimensional evaluation framework designed to standardize the benchmarking of RLM hardware and algorithms for future research.

JFR 2026-06-10

Cross‐Modal Synergistic Optimization Multi‐Task Segmentation Network for Autonomous Ground Intelligent Agents in Field Environments

Yifang Huang, Peng Shi, Haitao He, Xiaobing Hao

感知与传感
摘要

This study focuses on the perception of traversable areas and ground materials in complex field environments, driven by the application demands of autonomous ground intelligent agents in critical tasks such as battlefield support and disaster relief. Compared to existing studies on fine‐grained segmentation in structured environments (e.g., urban roads), perception in unstructured field environments is far more challenging due to irregular terrain and complex conditions, while research in this area remains limited. To address this, this paper proposes a cross‐modal synergistic optimization multi‐task segmentation network (CSOM‐Net) for ground intelligent agents in field environments, based on an in‐depth analysis of the technical requirements in complex field settings. This paper proposes a conflict‐consistency guided feature fusion (CCGF) method to resolve the challenge of feature conflicts in multimodal data that hinder effective fusion in complex field environments. A cross‐modal bidirectional mutual adaptation (CBMA) learning strategy is proposed to address the feature interference caused by significant differences in optimization objectives in cross‐modal tasks under field environments. Extensive comparative and ablation experiments were conducted on both real‐world and simulated complex field datasets to evaluate the proposed CSOM‐Net model. Experimental results show that CSOM‐Net outperforms conventional models, achieving approximately 5% and 2% mIoU improvements in point cloud traversable area and image ground material segmentation tasks, respectively. The model proves effective for joint segmentation tasks of point cloud traversable areas and image ground materials in complex field environments, providing a robust solution to enhance environmental perception capabilities for autonomous ground agents operating under challenging field conditions.

Sci. Robotics 2026-04-15

Demonstrate once, execute on many: Kinematic intelligence for cross-robot skill transfer

Sthithpragya Gupta, Durgesh Haribhau Salunkhe, Aude Billard

摘要

Teaching robots new skills should be as natural as showing rather than programming. Learning from demonstration (LfD) moves toward this goal by allowing users to guide a robot or sketch a desired motion, enabling learning without writing a line of code. However, most LfD methods remain tied to the robot they were trained on. Changes in morphology, different link lengths, joint orientations, or limits often break the learned behavior, making retraining unavoidable. Here, we introduce a framework that endows robots with kinematic intelligence: an internal understanding of their own joint limits, singularities, and connectivity. Instead of correcting for these constraints after learning, we embedded them directly into the control policy from the outset. The approach takes one or multiple demonstrations, extracts a globally stable dynamical system, and produces behaviors that remain valid across robots with different kinematic structures. Our method is grounded in a comprehensive analytical classification of noncuspidal three-revolute (3R) robots, which form the building blocks of many commercial robots. This classification enables a joint space policy that preserves user intent and adapts to robot-specific constraints. We validated the framework on diverse simulated and real robots, both redundant and nonredundant, with varied link geometries and joint configurations. The demonstrated skill executes safely and consistently across robots without retuning, thereby achieving cross-robot skill transfer.

RA-L 2026-03-26 · 被引 1

Lightweight Learning From Actuation-Space Demonstrations via Flow Matching for Whole-Body Soft Robotic Grasping

Liudi Yang, Yang Bai, Yuhao Wang, Ibrahim Alsarraj, Gitta Kutyniok, Zhanchi Wang, et al.

操作与机械臂医疗 / 软体 / 微纳控制与动力学
摘要

Robotic grasping under uncertainty remains a fundamental challenge due to its uncertain and contact-rich nature. Traditional rigid robotic hands, with limited degrees of freedom and compliance, rely on complex model-based and heavy feedback controllers to manage such interactions. Soft robots, by contrast, exhibit embodied mechanical intelligence: their underactuated structures and passive flexibility of their whole body naturally accommodate uncertain contacts and enable adaptive behaviors. To harness this capability, we propose a lightweight actuation-space learning framework that infers distributional control representations for whole-body soft robotic grasping directly from deterministic demonstrations using a flow matching model (Rectified Flow), without requiring dense sensing or heavy control loops. Trained with only 30 demonstrations covering less than 8% of the reachable workspace, the learned policy achieved a 97.5% grasp success rate over 1000 trials in simulation. In real-world experiments on 50 uniformly distributed targets, the policy achieved a 100% success rate, generalized to object size variations from -33% to +100%, and remained stable under execution-time scaling from 20% to 200%. These results demonstrate that actuation-space learning effectively embeds mechanical intelligence into control, significantly reducing reliance on centralized computation for grasping under uncertainty.

RA-L 2026-05-12

CrossTrack-UAV: A Real-World Dataset and Baseline for Cross-View Multi-Object Tracking Using Fixed-Wing UAVs

Chungsu Jang, Youngjung Kim, Sebin Lee, Wonsuk Kwon, Taehyun Kim, Yong-Duk Kim, et al.

无人机 / 空中机器人多机器人 / 集群
摘要

Cross-view multi-object tracking (CVMOT) aims to simultaneously track a group of objects over time in each view and to associate the same objects across different views. This task is essential for swarm monitoring and has attracted considerable attention in the literature. However, most existing datasets focus on static cameras or cameras with stable motion in low-altitude environments, which hinders progress in developing methods for fixed-wing unmanned aerial vehicles (UAVs). To fill the gap, this work presents CrossTrack-UAV: the first real-world and large-scale CVMOT dataset using fixed-wing UAV swarms equipped with 2-axis gimbal cameras. CrossTrack-UAV was collected by four UAVs flying manually, covering five different cities in South Korea. The resulting dataset comprises 53 distinct scenarios and 1,272 cross-view tracks with 1,059,408 annotated 2D bounding boxes for 3 classes. It also contains synchronized metadata from UAVs, including GPS position, altitude, and gimbal orientations essential for extracting 3D geometric features. We introduce a CVMOT baseline built on the multi-sensor joint probabilistic data association filter (M-JPDAF). We extend M-JPDAF with spatial coherence processing to filter noisy and inconsistent data. Extensive experiments on CrossTrack-UAV demonstrate the effectiveness of the proposed baseline and the value of the benchmark.

RA-L 2026-05-12

An Orthogonal-Axis Spherical Parallel Mechanism With Large Workspace for Humanoid Hip Joints: Design, Optimization, and Experimental Validation

Haotian Liu, Zhaoming Liu, Hongwei Wang, Long Cui

人形机器人控制与动力学
摘要

Conventional serial-chain configurations for humanoid robot lower limbs exhibit issues such as concentrated driving torques, inertial coupling, and error accumulation, leading to inadequate joint stiffness, limited control precision, and reduced dynamic stability. To overcome these drawbacks, this study proposes a novel Orthogonal-axis Spherical Parallel Mechanism, inspired by the ball-and-socket structure of the human hip joint. The kinematic model of the mechanism is established. A multi-objective optimization framework is then constructed, considering workspace coverage, dexterity, and maximum joint torque, and optimal parameters are obtained through weighted decision-making. Simulations and experiments verify that the optimized mechanism fully covers the common range of motion of the human hip joint, demonstrates improved dexterity within common motion ranges, and supports a single-leg payload of 40 kg. Integrating high stiffness, a large workspace, and superior load capability, this design offers a comprehensive solution for humanoid robotic legs.

RA-L 2026-05-12

TGS-SLAM: Tri-Plane Gaussian Splatting for Semantic SLAM

Guoxi Sun, Handong Shen, Xiaohao Liu, Xinchao Li, Lingyu Liang, Beibei Liu, et al.

操作与机械臂导航 / SLAM / 自动驾驶
摘要

Semantic SLAM aims to build 3D maps that are both geometrically accurate and semantically consistent for downstream tasks such as navigation, manipulation, and AR/VR. Existing dense neural approaches largely follow two paradigms: implicit neural fields, which offer strong cross-view consistency but rely on expensive NeRF-style volumetric rendering, and 3D Gaussian Splatting (3DGS), which is efficient but typically attaches semantics to individual Gaussians, yielding fragmented primitive-level semantics and semantic parameters that scale linearly with the number of primitives. We present TGS-SLAM, a semantic SLAM system that anchors geometry and appearance with 3D Gaussians while encoding semantics in TriDS, a shared coarse-to-fine tri-plane field decoupled from Gaussian primitives. Semantic features are queried at Gaussian centers, decoded into class logits, and splatted jointly with color and depth in a single rendering pass. We further adopt a geometry-first optimization schedule to stabilize the Gaussian map before semantic learning, and a hybrid pose initialization mechanism for robust tracking. Experiments on Replica, ScanNet, and ScanNet++ demonstrate that TGS-SLAM produces coherent semantic maps with competitive tracking and reconstruction quality. On Replica, it achieves 97.0% mIoU while reducing semantic parameters by $\sim 100\times$ compared with per-Gaussian one-hot encodings. Code is available at https://github.com/shand001/TGS-SLAM .

RA-L 2026-05-12

Condition-Number Adaptive-Weight PINN (A-PINN): A High-Fidelity and Real-Time Forward Kinematics Solver for Stewart Platforms

Xinyu Tian, Junlin Xiao, Hang Xu, Xing Hou, Fuhua Jia, Xiaoying Yang, et al.

机器人学习感知与传感
摘要

Parallel kinematic mechanisms (PKMs) are widely adopted in precision and heavy-load applications, yet obtaining the requisite forward kinematics (FK) for closed-loop control remains a challenge. FK in PKMs typically lack a unique closed-form solution, necessitating iterative numerical solvers that are prone to ill-conditioning near singular configurations. This letter proposes a condition-number adaptive physics-informed neural network that uses the normalized Jacobian condition number to modulate physics-based loss terms. The training objective enforces an implicit Sobolev-type regularization via operator consistency, encouraging kinematic feasibility while avoiding explicit supervision that depends on unstable Jacobian inversion in ill-conditioned regions. Real-world experiments verify 1 kHz operation as a kinematic observer for real-time velocity estimation, confirming the method's foundation for high-rate feedback and gradient-based control pipelines.

RA-L 2026-05-12

Shared Autonomy Assisted by Impedance-Driven Anisotropic Guidance Field

Sihan Chen, Hang Xu, Yupu Lu, Chen Wang, Benfang Duan, Ruixing Jia, et al.

人机交互 / 遥操作控制与动力学
摘要

Shared autonomy (SA) enables robots to infer human intent and assist in its achievement. While most research focuses on improving intent inference, it overlooks whether humans can understand the robot's intent in return. Without such mutual understanding, collaboration becomes less effective, degrading user experience and task performance. To address this gap, previous studies have explicitly conveyed the robot intent through additional interfaces, which remain unintuitive and limited in expressiveness. Inspired by impedance control, we propose Impedance-Driven Anisotropic Guidance Field Enhanced Shared Autonomy (IAGF-SA), a novel paradigm that extends SA with an embodied, physically-grounded communication channel. This channel adaptively modulates the robot's dynamic response to human input, enabling intuitive, continuous, physically-grounded robot intent communication while naturally guiding human actions. User studies across three scenarios and two teleoperation interfaces indicate that IAGF-SA improves task performance, human-robot agreement, and subjective experience, thus demonstrating its effectiveness in enhancing human-robot communication and collaboration.

RA-L 2026-03-25 · 被引 1

Electrical Impedance Tomography and Neural Networks for Shape Sensing in Soft Continuum Endoscopic Robots

Amirhosein Alian, James Avery, George Mylonas

操作与机械臂机器人学习医疗 / 软体 / 微纳
摘要

Soft robotics offer biocompatibility, dexterity, and safe tissue interaction in surgery, providing potential alternatives to conventional tools such as colonoscopes. However, their nonlinear behaviour demands closed-loop control with structure-compatible feedback. This work presents a scalable 3D shape-sensing method based on Electrical Impedance Tomography (EIT), unifying actuation and sensing of a hydraulically actuated soft robot within a neural network framework. A soft continuum manipulator (14.6 mm in diameter) with saline-pressurised chambers and embedded kirigami-inspired FPCs was evaluated in free motion and ex vivo porcine colon trials. A multilayer perceptron (MLP) predicted the full 3D shape, achieving tip RMSEs of 0.46, 0.20, and 0.40 mm (x, y, z) in free motion, and 1.96, 0.86, and 0.89 mm in ex vivo . This paper marks the first ex vivo validation of EIT-based shape sensing in soft endoscopy and demonstrating its potential for closed-loop surgical control.

JFR 2026-06-02

Design, Development, and Field Testing of a Tomato Bunch Harvesting Robot

Can Xu, Zefeng Xu, Huiling Li, Yitong Zhou

导航 / SLAM / 自动驾驶感知与传感
摘要

With the aging population and labor shortages, the proportion of labor costs in tomato harvesting is increasing, making the development of tomato harvesting robots imperative. This study developed an integrated tomato bunch harvesting robotic system for cherry tomatoes. A combined cutting and gripping end‐effector powered by a single actuator, achieving a cutting success rate of 93.33% and a gripping capacity of 1600 g. A parameterized camera arrangement was employed to match the robotic arm's field of view, thereby avoiding mutual interference. A tomato bunch and stalk recognition model was constructed based on the YOLOv4 algorithm to enable precise localization of harvesting points. The proposed tomato bunch‐stalk matching method achieved a recall rate of 99.22%, while the stalk growth‐direction qualitative discrimination method attained an accuracy of 97%. Field experiments demonstrated that the system achieved an average harvesting time of 12.23 s per tomato bunch and an overall harvesting success rate of 70.77% in unstructured environments, improving automation and operational efficiency compared to existing solutions. This research offers a solution integrating hardware optimization and perception algorithms for greenhouse harvesting robots, demonstrating potential for commercial application.

T-RO 2026-04-22

Construction of Generalized Force–Deformation Theoretical Model: Toward Efficient Systematic Optimization of Fin-Ray Effect Grippers

Haotian Guo, Ziyi Zheng, Chen Qiu, Wei Yu, Ye Pan, Huixu Dong

操作与机械臂
摘要

Soft grippers leveraging bioinspired fin-ray effect (FRE) have garnered significant attention for handling geometrically complex objects across diverse scenarios. Despite their benefits, the fundamental mechanics governing morphological transformation under external loading remain insufficiently understood, hindering practical grasping performance. To address this gap, this article derives, presents, and validates a generalized theoretical model for FRE employing co-rotation concept, providing comprehensive insights for design optimization. First, the co-rotation modeling and force-displacement relationship are derived, eliminating the small deformation assumption while providing high-fidelity predictions with enhanced computational efficiency. Second, we present comprehensive insights into FRE gripper design, encompassing univariate ablation studies of four key design parameters and Bayesian-Optimization-integrated structure optimization. Third, the approach is validated via comprehensive simulations and physical experiments. Simulation results demonstrate excellent agreements with finite element analysis, with an average error of 6% and greatly enhanced efficiency, as fast as 0.035 seconds. Physical experiments under point, multi-point, and distributed contact validate the model's generalizability and robustness, with relative deviations within approximately 2-6% under different loading and contact conditions. Finally, grasping evaluation with diverse daily objects demonstrates the framework's capacity to advance compliant robotic systems through synergistic mechanics modeling and data-driven optimization.

RA-L 2026-05-11

A Convergent Continuous Contact Solver With Explicit Separation Time for High-Stiffness Contact

Tianjian Lei, Junpeng Chen, Qifei Li, Jian S. Dai, Yang Pan

足式 / 四足机器人控制与动力学
摘要

Most modern contact solvers in multibody system (MBS) simulators adopt optimization-based relaxations that introduce a finite contact stiffness to enable smooth, invertible dynamics. While robust at moderate stiffness, these methods tend to diverge under the high stiffness required for realistic rigid-body contact, forcing artificial softening that compromises physical fidelity. This article presents a convergent continuous contact solver with explicit separation time, designed to support realistic high-stiffness contact dynamics. The solver derives analytical penetration dynamics under continuous contact, from which the contact separation time is explicitly determined. This enables robust handling of contact interactions with theoretically unbounded contact stiffness. The method is implemented in an open-source MBS simulator and evaluated on bouncing-object benchmarks and quadruped robot locomotion tasks. Experimental results demonstrate convergent and physically consistent simulations at contact stiffness values up to $10^{30}\,\mathrm{N/m}$ , showing that the solver handles both low- and high-stiffness contact within a single implementation.

RA-L 2026-05-11

Dynamic Sphere Envelopes and SFF-DMP for Real-Time Obstacle Avoidance in Space Cable Assembly

Xumeng Cheng, Gangfeng Liu, Jie Zhao

操作与机械臂控制与动力学
摘要

This paper addresses real-time obstacle avoidance and precision manipulation in on-orbit cable disassembly tasks involving flexible, vibrating cables. We propose a control framework that combines dynamic sphere envelopes with a Steering Force Field enhanced Dynamic Movement Primitive (SFF-DMP) formulation. The dynamic sphere envelopes approximate oscillating cables with radius adaptive virtual obstacles, enabling compact geometric representations of flexible dynamics. The proposed SFF-DMP integrates a Cartesian-space DMP with a steering force field, enabling smooth and reliable obstacle avoidance with low free-space loss, while the joint-space DMP preserves demonstration similarity and tracking accuracy through null-space optimization. Cable dynamics modeling and simulation studies validate the method's capability to avoid time-varying obstacles, and experiments on electrical connector disassembly demonstrate that a single cable free demonstration can generalize to multiple disturbed configurations. Results show consistent collision-free execution with position error below 0.7 mm and orientation error below 0.011 rad. The proposed approach offers a low-demonstration-cost, high-space-efficiency solution for safe manipulation in dynamic and constrained space environments.

RA-L 2026-05-11

Reactive Motion Generation via Phase-Varying Neural Potential Functions

Ahmet Tekden, Dimitrios Kanoulas, Aude Billard, Yasemin Bekiroglu

操作与机械臂控制与动力学
摘要

Dynamical systems (DS) methods for Learning-from-Demonstration (LfD) provide stable, continuous policies from few demonstrations. First-order dynamical systems (DS) are effective for many point-to-point and periodic tasks, as long as a unique velocity is defined for each state. For tasks with intersections (e.g., drawing an “8”), extensions such as second-order dynamics or phase variables are often used. However, by incorporating velocity, second-order models become sensitive to disturbances near intersections, as velocity is used to disambiguate motion direction. Moreover, this disambiguation may fail when nearly identical position–velocity pairs correspond to different onward motions. In contrast, phase-based methods rely on open-loop time or phase variables, which limit their ability to recover after perturbations. We introduce Phase-varying Neural Potential Functions (PNPF), an LfD framework that conditions a potential function on a phase variable which is estimated directly from state progression, rather than on open-loop temporal inputs. This phase variable allows the system to handle state revisits, while the learned potential function generates local vector fields for reactive and stable control. PNPF generalizes effectively across point-to-point, periodic, and full 6D motion tasks, outperforms existing baselines on trajectories with intersections, and demonstrates robust performance in real-time robotic manipulation under external disturbances.

RA-L 2026-05-11

Locomotion of an Elastic Snake Robot via Natural Dynamics

Tristan Ehlert, Arne Sachtler, Annika Schmidt, Davide Calzolari, Alin Albu-Schäffer

足式 / 四足机器人控制与动力学
摘要

Nature suggests that exploiting the elasticities and natural dynamics of robotic systems could increase their locomotion efficiency. Prior work on elastic snake robots supports this hypothesis, but has not fully exploited the nonlinear dynamic behavior of the systems. Recent advances in eigenmanifold theory enable a better characterization of the natural dynamics in complex nonlinear systems. This letter investigates if and how the nonlinear natural dynamics of a kinematic elastic snake robot can be used to design efficient gaits. Two types of gaits based on natural dynamics are presented and compared to a state-ofthe- art approach using dynamics simulations. The results reveal that a gait generated by switching between two nonlinear normal modes does not improve the locomotion efficiency of the robot. In contrast, gaits based on non-brake periodic trajectories (nonbrake orbits) are perfectly efficient in the energy-conservative case. Further simulations with friction reveal that, in a more realistic scenario, non-brake orbit gaits achieve higher efficiency compared to the baseline gait on the rigid system. Overall, the investigation offers promising insights into the design of gaits based on natural dynamics, fostering further research.

RA-L 2026-05-11

Robust by Co-Design: CAD-to-Control Evolutionary Optimization of Jet-Powered Humanoids

Punith Reddy Vanteddu, Davide Gorbani, Antonello Paolino, Giuseppe L’Erario, Hosameldin Awadalla Omer Mohamed, Fabio Bergonti, et al.

无人机 / 空中机器人人形机器人
摘要

This paper presents a co-design and optimization framework for improving robustness in an aerial humanoid equipped with variable turbine configurations. The approach integrates CAD-based modeling, domain randomization, and multi-objective evolutionary optimization to jointly explore turbine placement, jetpack geometry, and controller gains under uncertainty. We generate a family of candidate robot models using Uniform Latin Hypercube sampling and perturb nominal inertial properties and turbine thrust estimation parameters to emulate modeling and actuation uncertainty encountered in practice. We evaluate robustness through randomized assessments of trajectory-tracking and fuel-consumption objectives using an adapted NSGA-II algorithm that incorporates fairness-aware seed selection and multi-seed performance aggregation. Across these stochastic evaluations, the framework identifies design–controller pairs that maintain stable flight behavior while balancing energy efficiency. As all evaluated configurations are derived from CAD-generated models, the resulting solutions ensure physical consistency and manufacturability, providing a structured foundation for robustness-driven co-design of jet-powered aerial humanoid systems.

RA-L 2026-05-11

Design, Optimization and Experiment of a Detachable Bi-Segment Hybrid Aerial-Underwater Vehicle With Morphable Arms

Xiqiao Han, Kaixuan Han, Yulin Bai, Ziyang Zhang, Zheng Zeng

无人机 / 空中机器人控制与动力学
摘要

Effective ocean observation requires hybrid aerial-underwater vehicles (HAUVs) to operate efficiently across both air and water. However, such cross-domain missions often lead to significant hydro-aerodynamic conflicts and endurance limitations caused by structural redundancy. This letter proposes Nezha-Split, a morphing detachable bi-segment HAUV that performs repeated cross-domain maneuvers and supports on-demand jettisoning of its positively buoyant, retrievable aquatic segment to eliminate structural redundancy and extend aerial endurance under demanding conditions. The vehicle features a multifunctional folding mechanism where the rotor arms double as an active center of gravity (CoG) adjustment system, achieving robust passive stability across four distinct configurations introduced by the detachable design. To balance the conflicting requirements of different configurations, a multi-objective optimization (MOO) framework is implemented. Furthermore, a squid-inspired tri-fin nose cone enhances roll stability and downward lift underwater, achieving an average 88% improvement in lift-to-drag ratio. Experiments from pool and flight tests confirm that the detachable architecture leads to a 42.2% reduction in hover power consumption while ensuring smooth water-air transitions and underwater depth-keeping. This work provides a novel methodology for developing high-efficiency, multi-domain robotic systems with on-demand structural reconfiguration.

RA-L 2026-05-11

Diffusion-Based Restoration for Multi-Modal 3D Object Detection in Adverse Weather

Zhijian He, Feifei Liu, Yuwei Li, Zhanpeng Luo, Jintao Cheng, Xieyuanli Chen, et al.

导航 / SLAM / 自动驾驶感知与传感
摘要

Multi-modal 3D object detection is important for reliable perception in robotics and autonomous driving. However, its effectiveness remains limited under adverse weather conditions due to weather-induced distortions and misalignment between different data modalities. In this work, we propose DiffFusion, a novel framework designed to enhance robustness in challenging weather through diffusion-based restoration and adaptive cross-modal fusion. Our key insight is that diffusion models possess strong capabilities for denoising and generating data that can adapt to various weather conditions. Building on this, DiffFusion introduces Diffusion-IR restoring images degraded by weather effects and Point Cloud Restoration (PCR) compensating for corrupted LiDAR data using image object cues. To tackle misalignments between two modalities, we develop Bidirectional Adaptive Fusion and Alignment Module (BAFAM). It enables dynamic multi-modal fusion and bidirectional bird's-eye view (BEV) alignment to maintain consistent spatial correspondence. Extensive experiments on three public datasets show that DiffFusion achieves state-of-the-art robustness under adverse weather while preserving strong clean-data performance. Zero-shot results on the real-world DENSE dataset further validate its generalization. The implementation of our DiffFusion has been released at https://github.com/SCNU-RISLAB/DiffFusion .

RA-L 2026-05-11

Equivariant Filter for Radar-Inertial Odometry

Giulio Delama, Jan Michalczyk, Morten Nissov, Martin Scheiber, Alessandro Fornasier, Kostas Alexis, et al.

无人机 / 空中机器人导航 / SLAM / 自动驾驶
摘要

Radar-Inertial Odometry (RIO) based on the Extended Kalman Filter (EKF) relies on accurate extrinsic calibration between the radar and the Inertial Measurement Unit (IMU) and is sensitive to disturbances, as large linearization errors can degrade performance or even cause divergence. To address these limitations, this letter proposes an Equivariant Filter (EqF) for RIO based on a Lie group symmetry that geometrically couples navigation states and IMU biases, extending it to incorporate radar-IMU extrinsic calibration and multi-state constraint updates. This equivariant formulation inherently preserves consistency and enhances robustness, enabling reliable state estimation even under poor or completely wrong initialization of calibration states. Real-world experiments on two different Uncrewed Aerial Vehicles (UAVs) show that the proposed EqF-RIO achieves state-of-the-art accuracy under correct extrinsic calibration and offers improved convergence under large calibration errors, where the conventional EKF-RIO fails. Evaluation code is open-sourced.

RA-L 2026-05-11

Side-Scan Sonar SLAM Using Ping-Level Landmark Detection in Feature-Poor Seabed Environments

Jinho Im, Seonghun Hong

导航 / SLAM / 自动驾驶感知与传感
摘要

Side-scan sonar (SSS) is a particularly attractive sensing modality for underwater simultaneous localization and mapping (SLAM), offering wide-area seabed coverage and reliable acoustic measurements over long ranges. Many existing SSS-based SLAM approaches rely on image-domain processing, leveraging acoustic images formed by stacking one-dimensional sonar pings. However, such approaches inherently depend on the availability of sufficiently rich image-domain features and can therefore struggle in feature-poor or homogeneous seabed environments where distinctive visual structures are limited. Furthermore, even when image-domain features are present, range-dependent intensity variations and speckle noise inherent in SSS measurements can degrade the reliability of feature extraction and data association. This study proposes a ping-level SSS SLAM framework that directly exploits raw backscatter intensity profiles without relying on image formation. By characterizing the nominal seafloor response and identifying structurally salient deviations in the acoustic intensity profiles, reliable landmark measurements are extracted at the ping level and incorporated into a landmark-based SLAM framework. This formulation preserves the native sensing geometry of SSS measurements and enables robust landmark extraction even in feature-sparse environments. The proposed approach is validated through both physics-based simulations and real-world field experiments, demonstrating improved robustness and localization accuracy in challenging seabed conditions.

RA-L 2026-05-11

Multi-View Projection-Based Self-Interference Detection and Interfering Path Point Optimization for Manipulators

Xiao Zhang, Xueting Hu

操作与机械臂感知与传感
摘要

When manipulators perform non-repetitive tasks in dynamic environments, the generated trajectories are often highly nonlinear and difficult to verify in advance, which increases the risk of self-interference during execution. Existing studies mainly rely on detecting abrupt changes in physical signals after interference occurs, which may cause structural impact and damage. Geometric modeling approaches based on simplified bounding structures can provide predictive detection, but their conservative representations often introduce redundant envelope space and reduce motion flexibility. To address these limitations, this paper proposes a manipulator self-interference detection and path optimization method based on multi-view projection. First, an equivalent-volume representation of the manipulator is constructed by projecting the three-dimensional structure onto feature-sensitive planes and extending contour edge points along normal directions to form a compact three-layer key-point set. Then, the separability of projected key-point sets on interference-discriminative projection planes is evaluated through a geometric discrimination function to determine potential self-interference at path points. For the path points identified as interfering, a local iterative adjustment strategy based on the separating line is further applied to modify the path while preserving the original path geometry as much as possible. Simulation and experimental results demonstrate that the proposed method effectively improves self-interference detection reliability and path optimization efficiency, showing strong potential for practical industrial applications.

T-RO 2026-04-27

Model-Free Magnetic Servoing for Pose Control of Capsule Robots

Chang Liu, Xiaoyang Wu, Jiaole Wang, Shuang Song

摘要

This paper proposes a model-free magnetic servoing technique for closed-loop pose control of capsule robots. The system employs an internal permanent magnet (IPM) inside the capsule and an external permanent magnet (EPM) manipulated by a robotic arm. Control feedback is directly derived from magnetic field measurements (via a sensor array) and optical tracking of the EPM's pose. To enable model-free control, we develop a feature extraction method that condenses the IPM's magnetic sensor data into a compact representation. Based on this magnetic feature, we derive a Jacobian matrix that directly maps the feature space to the robotic arm's joint configuration space. To adaptively handle system nonlinearities, the Jacobian matrix is iteratively updated using an unscented Kalman filter, bypassing the need for an explicit dynamic model. Experimental results demonstrate the effectiveness of our approach, with average tracking errors of 0.62 mm in position and 0.75 $\circ$ in orientation during simultaneous pose control tasks.

T-RO 2026-04-21

Complete Autonomous Robotic Nasopharyngeal Swab System With Evaluation on a Stochastically Moving Phantom Head

Peter Q. Lee, John S. Zelek, Katja Mombaur

操作与机械臂
摘要

The application of autonomous robotics to close-contact healthcare tasks, such as the nasopharyngeal (NP) swab test, has clear potential for reducing infection risks to staff and improving efficiency. We propose a control system that performs the NP swab test with a collaborative manipulator arm, guided by an instrumented end-effector that measures force and visual information. We assume a scenario where the patient is unrestrained, with hardware general enough for other types of close contact tasks. The system employs visual servo control to align the swab with the nostrils. A compliant joint velocity controller inserts the swab into the nasal cavity, following a planned trajectory adjusted with force-feedback. Fuzzy logic systems are designed to detect when the swab reaches the nasopharynx and enforce safety criteria. We validate the system using a second robotic arm that holds a nasal cavity phantom and simulates natural head motions. Extensive experiments identify controller configurations capable of effectively performing the NP swab test even with significant head motion.

T-RO 2026-04-21

Edge Nearest Neighbor: Neighbor-Finding Revisited in Sampling-Based Motion Planning

Stav Ashur, Nancy M. Amato, Sariel Har-Peled

导航 / SLAM / 自动驾驶
摘要

Neighborhood finders and nearest neighbor queries are fundamental components of sampling-based motion planning (SBMP) algorithms. Using different distance metrics or otherwise changing the definition of a neighborhood produces different algorithms with unique empirical and theoretical properties. In his textbook on planning algorithms, LaValle suggests a neighborhood finder for the Rapidly-exploring Random Tree (RRT) algorithm, which finds the nearest neighbor of the sampled point on the swath of the tree, that is, on the set of all of the points on the tree edges, using a hierarchical data structure. In this paper, we implement such a neighborhood finder and show, theoretically and experimentally, that this results in more efficient algorithms.

T-RO 2026-04-21

QuadricsReg: Large-Scale Point Cloud Registration Using Semantic Quadric Primitives

Ji Wu, Huai Yu, Shu Han, Ximeng Cai, Mingfeng Wang, Wen Yang, et al.

感知与传感
摘要

Designing an effective and scalable scene primitive representation is fundamental for large-scale point cloud registration. Existing studies that predominantly rely on dense point clouds or single-type geometric primitives struggle to scale to scenes characterized by massive data volume, structural diversity, and wide viewpoint variations. To address these, this paper introduces QuadricsReg , a novel point cloud registration framework based on quadric primitives for large-scale environments. We compactly model diverse scene structures within a unified semantic quadric formulation, achieving high compression while preserving geometric richness and discriminability. This representation enables efficient quadric matching initialization via intrinsic similarity and robust correspondence pruning by maximizing geometric consistency in a multi-level graph, ensuring reliable associations even under large viewpoint variations. Furthermore, we design a factor graph based on degeneracy-aware quadric residual to estimate the transformation, ensuring accurate alignment in heterogeneous scenes. We evaluate QuadricsReg on 5 public datasets, where its exceptional registration performance with low overhead demonstrates strong scalability for large-scale scenarios. With a compact representation of $\sim 29.5 \mathrm{KB/scan}$ on KITTI, nearly 100% registration success rate is achieved for point cloud pairs within $10 \mathrm{m}$ . Real-world testing on the self-collected dataset further validates its robustness and generalization ability across different LiDAR sensors and robot platforms.

JFR 2026-04-12 · 被引 2

4SWLR: A Switched System and Skid Steer Integrated Whole‐Body Control Framework for Wheeled‐Legged Robots

Mingfan Xu, Ziyi Yang, Chuyan Xu, Jing Zhao, Yechen Qin

足式 / 四足机器人导航 / SLAM / 自动驾驶控制与动力学
摘要

Inspired by mammalian locomotion and vehicle skid‐steering principles, this paper proposes a real‐time motion planning and tracking control framework for wheeled‐legged robots, integrating the obstacle‐crossing capability of legged robots with the skid‐steering mechanism of wheeled platforms. Unlike conventional wheeled‐legged robot control methods that rely on external swing joints, the proposed framework leverages differential wheel actuation while comprehensively accounting for the robot‐environment coupling effects under high‐speed conditions, enabling efficient and stable high‐speed steering. First, a hierarchical wheel‐terrain contact dynamics model and a skid‐steering kinematics model are established for wheeled‐legged robots with skid‐steering. By combining switched‐system skid‐steering kinematics with refined wheel–environment interaction dynamics, the framework effectively addresses active wheel torque control during high‐speed steering. Second, a skid‐steering‐based motion paradigm is introduced, which co‐optimizes legged dynamics and wheeled skid‐steering kinematics, eliminating the need for continuous leg‐lifting maneuvers to generate lateral forces and ensuring smooth high‐speed steering. Finally, extensive experiments conducted in challenging environments—including staircases, trenches, ramps, single‐side bridges, and unpaved terrains—validate the robustness and efficacy of the proposed approach. Comparative studies with state‐of‐the‐art wheeled‐legged control methods further demonstrate the superior mobility performance and enhanced wheel–terrain interaction dynamics achieved by our framework.

JFR 2026-05-25

Deep Reinforcement Learning Based Autonomous Decision‐Making for Cooperative Uncrewed Aerial Vehicles: A Search and Rescue Real World Application

Thomas Hickling, Maxwell Hogan, Abdulla Tammam, Nabil Aouf

无人机 / 空中机器人导航 / SLAM / 自动驾驶机器人学习感知与传感多机器人 / 集群控制与动力学
摘要

This paper presents the first end‐to‐end framework that combines guidance, navigation, and centralized task allocation for multiple UAVs performing autonomous search‐and‐rescue (SAR) in GNSS‐denied indoor environments. A twin delayed deep deterministic policy gradient controller is trained with an artificial potential field (APF) reward that blends attractive and repulsive potentials with continuous control, accelerating convergence and yielding smoother, safer trajectories than distance‐only baselines. Collaborative mission assignment is solved by a deep Graph Attention Network that, at each decision step, reasons over the drone‐task graph to produce near‐optimal allocations with negligible on‐board compute. To arrest the notorious Z‐drift of indoor LiDAR‐SLAM, we fuse depth‐camera altimetry with IMU vertical velocity in a lightweight complementary filter, giving centimeter‐level altitude stability without external beacons. The resulting system was deployed on two 1 m‐class quad‐rotors and flight‐tested in a cluttered, multi‐level disaster mock‐up designed for the NATO‐Sapience Autonomous Cooperative Drone Competition. Compared with prior DRL guidance that remains largely in simulation, our framework demonstrates an ability to navigate complex indoor environments, securing first place in the 2024 event. These results demonstrate that APF‐shaped DRL and GAT‐driven cooperation can translate to reliable real‐world SAR operations.

JFR 2026-05-25

Pose Estimation Accuracy Improvement Using Different Orientation Representations With Neural Networks: Case Study for the VIVE HTC Tracker

Sławomir Romaniuk, Milica Petrović, Adam Wolniakowski, Roman Trochimczuk, Grzegorz Masłowski

机器人学习感知与传感控制与动力学
摘要

In robotic applications, the HTC VIVE tracker is frequently used for Learning from Demonstration. This solution and similar devices are not industrial‐grade, meaning that their accuracy in tracking movements in three‐dimensional space needs to be refined to ensure it is sufficient for programming precise robot positioning tasks. Several different methods are often used to improve position tracking accuracy, such as polynomial correction and neural networks. Various methods of parameterizing position and orientation are also frequently used, such as Euler angles or quaternions, although trade‐offs between compactness and numerical stability mean that they are suitable for different correction methods. In this study, we explore four different ways of representing the pose orientation (axis‐angle representation, quaternion, rotation matrix, Zhou representation) for the purpose of implementing pose tracking accuracy correction for the HTC VIVE tracker. The paper investigates the applicability of these representations for the Neural Network correction methods and compares the results with the classical polynomial correction method. As part of the study, three experiments were conducted involving measurements of the actual and measured positions of the robot and the tracker. An industrial UR5e robot arm from Universal Robots was used as the reference system for collecting measurement data, with an HTC VIVE tracker mounted on its wrist. The obtained results confirm that both the representation and neural network architecture significantly influence calibration effectiveness of HTC VIVE tracker. The results of the experiments showed that the Zhou parametrization of the orientation, combined with neural network rectification, performs best and results in a 17‐fold improvement in the pose estimation accuracy of the HTC VIVE tracking system. The PMC algorithm offers a valuable alternative when fast calibration is required, providing significant accuracy improvements with minimal computational cost.

RA-L 2026-05-15

BurnDC: A Progressive Propagation Framework for Low Coverage Depth Completion

Zhengyu Zhu, Cong Zhang, Hongmin Liu, Bin Fan

感知与传感
摘要

The growing adoption of compact and cost-effective solid-state LiDARs has greatly advanced robotics. However, their inherently limited Field-of-View (FOV) hinders their application in tasks requiring wide-range depth perception. To overcome this limitation, we introduce the Low Coverage Depth Completion (LCDC) task, which aims to generate a full-scene depth map from a low coverage sparse depth map and a corresponding RGB image, and propose a tailored framework named BurnDC. BurnDC progressively expands the propagation frontier around reliable depth anchors via Progressive Depth Burn (PDB) and utilizes Weighted Ring Attention (WRA) to inject stable geometric context into boundary regions, achieving a controlled refinement and completion of the entire depth map. To evaluate LCDC, we construct a real-world solid-state LiDAR based benchmark, LC-TIERS, and simulate low coverage settings in NYUv2 and KITTI. Experimental results demonstrate that BurnDC significantly outperforms existing methods in low coverage scenarios, with an RMSE reduction of 10-20% over top competitors. Our work provides a promising solution for unlocking the full potential of solid-state LiDARs in various applications. Code available at: https://github.com/yudmoe/burnDC/ .

RA-L 2026-05-15

SEED: Structure-Entropy and DCT Enhanced Descriptor for Robust 4D Radar Place Recognition

Feipeng Chen, Lihui Wang, Zehua Ying, Song Xue, Kui Wang, Xueyong Xu, et al.

导航 / SLAM / 自动驾驶
摘要

Reliable place recognition in adverse weather remains a critical challenge for autonomous navigation. While 4D millimeter-wave radar provides robust sensing capabilities, its data is characterized by sparsity, multipath noise, and measurement uncertainty. To address these challenges, we propose SEED, a two-stage method for 4D radar place recognition. We first organize static radar returns in a polar voxel map and compute an entropy-weighted feature that combines density and geometric cues to suppress unstable clutter while enhancing informative structures. We then project the voxel map into range-elevation and range-azimuth matrices, from which compact descriptors are extracted using truncated DCT. A hierarchical search first performs coarse retrieval and then refines matches. Experiments on multiple 4D radar datasets, including snowy and unstructured scenes, show that SEED outperforms the compared baselines while maintaining real-time performance on embedded hardware.

RA-L 2026-05-15

Learning From Multi-Quality Demonstrations in Dynamic Movement Primitives

Hao Jiang, Jianping He, Xiaoming Duan

控制与动力学
摘要

Learning from Demonstrations (LfD) is an attractive paradigm in robot learning, enabling robots to acquire skills by observing human demonstrations without explicit programming. However, existing LfD approaches typically assume that users provide ideal demonstrations, which rarely holds in practice, especially as non-expert inputs often vary in quality. Such multi-quality demonstrations can cause instability in LfD models and produce outputs that deviate from the desired behavior. To address this, recent studies have improved high-level LfD approaches with notable success, whereas low-level approaches represented by the Dynamic Movement Primitives (DMPs) have received limited attention. In this letter, we propose a novel method that enables DMP to effectively learn from multi-quality demonstrations, capturing user intents and mitigating quality inconsistencies. Specifically, the proposed method combines Dynamic Time Warping (DTW) with a representation learning model (TS2Vec) for unsupervised identification, estimating the relative qualities of DMP's forcing terms and assigning scores. Then, when modeling these terms with a Gaussian Mixture Model, we introduce a latent variable representing the desired forcing term and formulate a weighted joint Maximum A Posteriori objective, enabling reliable modeling guided by the identified scores. Simulation and experimental results show that our method enables DMPs to produce outputs closer to the desired behavior, with improvements in compactness ( $18\times$ ), smoothness ( $20\times$ ), and similarity to the desired demo ( $5\times$ ).

RA-L 2026-05-15

Certifiable Alignment of GNSS and Local Frames via Lagrangian Duality

Baoshan Song, Matthew Giamou, Penggao Yan, Chunxi Xia, Li-Ta Hsu

导航 / SLAM / 自动驾驶
摘要

Estimating the absolute orientation of a local system relative to a global navigation satellite system (GNSS) reference often suffers from local minima and high dependency on satellite availability. Existing methods for this alignment task rely on abundant satellites unavailable in GNSS-degraded environments, or use local optimization methods which cannot guarantee the optimality of a solution. This work proposed a certifiable method, meaning it can numerically verify the optimality of the result, filling a gap where existing local optimizers fail. We first formulate the original Doppler-based frame alignment problem as a nonconvex quadratically constrained quadratic program (QCQP) problem and relax the QCQP problem to a concave Lagrangian dual problem that provides a lower cost bound for the original problem. Then we perform relaxation tightness and observability analysis to derive criteria for certifiable optimality of the solution. Finally, simulation and real world experiments are conducted to evaluate the proposed method. The experiments show that our method provides certifiably optimal solutions even with only 2 satellites with Doppler measurements and 2D vehicle motion, while the traditional velocity-based VOBA method and the advanced GVINS alignment technique may fail or converge to local optima without notice.

RA-L 2026-05-14

Weighted Online Koopman Learning for Model Predictive Control of Thrust-Vectored Underwater Vehicles

Yizong Chen, Zhiqiang Miao, Jun Wei, Yaonan Wang

控制与动力学
摘要

Underwater vehicles face control challenges including hydrodynamic parameter uncertainties and environmental disturbances. While Koopman-based model predictive control (Koopman-MPC) offers computational efficiency through linear lifted representations, existing methods rely on offline-collected datasets and thus struggle to adapt to the aforementioned challenges. This letter proposes a weighted online Koopman-MPC framework that achieves trajectory tracking through online learning. First, we construct physical information basis functions, explicitly embedding the Coriolis, damping, and restoring force terms into the Koopman lifted space. Second, a weighted incremental Extended Dynamic Mode Decomposition (EDMD) scheme fuses offline data with real-time measurements via a tunable parameter. Finally, an incremental MPC scheme is designed to optimize the control increments to satisfy the actuator rate constraints. Experimental validation on a thrust-vectored underwater vehicle demonstrates that the proposed framework effectively adapts to changing dynamics during operation, improving trajectory tracking accuracy over both the offline model and uniform online updates.

RA-L 2026-05-14

Resolving State Ambiguity in Robot Manipulation via Adaptive Working Memory Recoding

Qingda Hu, Ziheng Qiu, Zijun Xu, Kaizhao Zhang, Xizhou Bu, Zuolei Sun, et al.

操作与机械臂
摘要

State ambiguity is common in robotic manipulation. Identical observations may correspond to multiple valid behavior trajectories. The visuomotor policy must correctly extract the appropriate types and levels of information from the history to identify the current task phase. However, naively extending the history window is computationally expensive and may cause severe overfitting. Inspired by the continuous nature of human reasoning and the recoding of working memory, we introduce PAM, a novel visuomotor policy equipped with adaptive working memory. With minimal additional training cost in a two-stage manner, PAM supports a 300-frame history window while maintaining high inference speed. Specifically, a hierarchical frame feature extractor yields two distinct representations for motion primitives and temporal disambiguation. The latter, namely the context features, are cached during inference and serve as the working memory in PAM. For compact representation, a context router with range-specific queries is employed to produce compact context features across multiple history lengths. We design seven real-world tasks covering diverse forms of state ambiguity and show that PAM can handle them within a unified policy. With a history window of approximately 10 seconds, PAM still supports stable training and maintains inference speeds above 20 Hz.

RA-L 2026-05-14

Fault-Tolerant NMPC With Safety Guarantees for Underwater Vehicles

Jun Wei, Zhiqiang Miao, Jinbao Zhang, Yizong Chen, Yaonan Wang

控制与动力学
摘要

This letter proposes a fault-tolerant control (FTC) framework for thruster-actuated underwater vehicles operating under thruster failures and spatial safety constraints. The framework integrates a nonlinear model predictive controller (NMPC) based on unit quaternion attitude parameterization with High-Order Control Barrier Functions (HOCBFs). The quaternion formulation eliminates gimbal lock singularities, thereby enabling stable attitude control during large-angle maneuvers. The HOCBF constraints guarantee that the vehicle remains within designated operational regions through forward invariance. By explicitly embedding thruster health into both NMPC optimization and HOCBF constraints, the framework enables real-time control reallocation among healthy thrusters while preserving safety guarantees. Experimental validation on a BlueROV2 Heavy platform demonstrates stable trajectory tracking with guaranteed constraint satisfaction under progressive thruster failures, singularity-free operation, and enforcement of workspace boundaries and obstacle avoidance constraints.

RA-L 2026-05-14

Robust Dual-Modal Grasping Using a Stiffness-Switchable Adhesive Array

Lvzhou Li, Kai Zhu, Qingyue Li, Xiangli Wen, Xu Dong, Yaoyao Jiang, et al.

操作与机械臂
摘要

This letter presents a dual-modal adaptive gripper (DMAG) that addresses the persistent trilemma in robotic grasping among shape adaptability, high load capacity, and environmental resilience. By introducing a temporally decoupled “conform-then-lock” paradigm, the DMAG separates geometric conformation from force transmission through a fully mechanical, media-agnostic locking mechanism. Inspired by the stiffness modulation of anemones, it translates this biological principle into a fully mechanical architecture, enabling a modular adhesive array to switch between compliant and rigid states. Experimental validation confirms: (1) a 111-fold stiffness modulation achieving over $100\%$ grasp force enhancement; (2) sustained functionality from atmospheric pressure to low-vacuum ( $1.3\times 10^{2}$ Pa); (3) $>95\%$ grasp success rate under $20\%$ simulated unit failure; and (4) reliable grasping across two orders of magnitude in object scale. The DMAG establishes an integrated hardware architecture that reconciles the complementary strengths of soft, articulated, and adhesive grippers while overcoming their inherent limitations.

RA-L 2026-05-14

Multi-Session SLAM for Imaging Sonar Equipped Underwater Vehicles Using Semantic Scene Graphs

John McConnell, Yewei Huang, Thomas Morris, Josh Doughty, Dennis Moynihan

导航 / SLAM / 自动驾驶
摘要

Accurate, reliable state estimation is an essential component in underwater autonomy. In underwater environments, simultaneous localization and mapping (SLAM) is often employed to provide a robot with state estimates, given the lack of GPS. SLAM state estimates exhibit error growth over time, which can be mitigated by re-observing a previously visited location (place recognition). However, this is trajectory-dependent, prone to outlier measurements, and, if a robot is tasked with driving to a place recognition destination, it could cost critical resources such as fuel or battery life. This motivates the multi-session SLAM problem, where a robot finds place recognition between its own data and a prior SLAM mission, enhancing the quality and robustness of the current SLAM solution. We propose a semantic scene graph method that enables robust multi-session SLAM. We detect objects in the environment, build scene graphs, and propose methods for place recognition and for estimating rigid-body transforms between prior and current scene graphs. By using semantic scene graphs, we show robustness to outliers and degeneracy in underwater sonar data. Additionally, the proposed method is compact, allowing it to be stored with minimal memory cost and transmitted to robots with low latency. We validate our method across three real-world environments, performing ablation studies on the proposed system and benchmarking against the literature. The views expressed in this document are those of the author(s) and do not reflect the official policy or position of the U.S. Naval Academy, Department of the Navy, the Department of Defense, or the U.S. Government.

JFR 2026-06-10

Two Fossa Flat Minima Optimization Algorithm‐Based Enhancement of Ecological Balance Using Carbon‐Neutral Eco‐Robots With Situational Intelligence for Air Quality Monitoring

Kavitha Devi K, Rubin Bose S., M. Saravanan, Sathya Selvaraj Sinnasamy, L. Sasikala, Judy Flavia B., et al.

摘要

To maintain ecological balance (EB), eco‐robots are equipped with situational Intelligence (SI). Yet, the robot itself causes pollution through emissions. None of the existing studies focused on designing carbon‐neutral eco‐robots (CNER) for air quality monitoring (AQM). Thus, this paper proposes an enhanced EB model using CNERs with SI for AQM. The process begins with deploying the lightweight optimal eco‐robot design with a minimum carbon footprint. Next, by using the air quality along with the pollution assessment dataset, an AI‐assisted eco‐robots‐based AQM framework is trained. Here, the dataset is pre‐processed; then, features are extracted. Next, to predict the air quality as good, moderate, poor, and hazardous, the proposed probabilistic Matusita neural soft clipish network (PMNSCN) is employed. In real‐time, the environmental data are sensed via the multi‐factor sensors. Afterward, to avoid unnecessary delays, the noisy data are refined. Then, the air quality is predicted, followed by on‐site active interventions (OAI). Risk‐based prioritization is done to offer long‐term measures. According to the risk zones, the authorities undertake eco‐robot‐based clean‐up efforts. The proposed approach ultimately promotes a sustainable environment with 98.9773% accuracy.

RA-L 2026-05-12

A Novel Kinematically Redundant Orthopedic Robot for Avoiding Interference in Long Bone Deformity Correction

Jianfeng Li, Qianhui Ma, Jie Ju, Tuxian Ye, Mingjie Dong, Shiping Zuo

操作与机械臂
摘要

Long bone deformity correction with orthopedic robots is guided by a preplanned correction path, and the continuous implementation of the entire process is highly important for achieving acceptable outcomes. However, multiple types of interference phenomena (i.e., strut overstroke, singularity configuration, and soft tissue-strut collision) frequently occur, which necessitate temporary robot reassembly and path replanning that interrupt the correction process. To solve this issue, this letter introduces the self-motion characteristics of redundant parallel mechanisms into the design of orthopedic robots. On this basis, a novel redundant configuration is proposed, and its mechatronic design is developed. Afterward, on the basis of the kinematic model and the cylinder-elliptical table analytical model, avoidance indices are presented for predictiong the interference state. By integrating the indices and performing weight discrimination, a configuration optimization algorithm is constructed for all interference avoidance tasks. Finally, simulations and prototype experiments demonstrate that the proposed design concept and methodology can assist surgeons in planning an interference-free adjustment scheme and ensure the continuity of the correction process without altering the original robot assembly and correction path.

RA-L 2026-05-12

Improved Anti-Peak Extended State Observer Based Data-Driven Trajectory Tracking Control for Unmanned Marine Vehicles

Li-Ying Hao, Xuqi Zhang, Huiying Liu

控制与动力学
摘要

For extended state observer (ESO)-based trajectory tracking, the peaking phenomenon in state estimates can cause instability or performance degradation. This work proposes an improved anti-peak ESO-based data-driven control strategy for unmanned marine vehicles (UMVs). A local compact form dynamic linearization (CFDL) model is established, relaxing the traditional requirement of non-zero input differences. An improved anti-peak ESO estimates lumped disturbances for compensation, while output error rate enhances tracking accuracy. The approach improves historical data utilization with low computational cost. Stability is proven via mathematical induction, and simulations show superior performance in tracking error reduction and peaking suppression compared to conventional ESO methods.

RA-L 2026-05-12

Diffusion-Enhanced Tree Planning for Autonomous Driving

Qiyuan Liu, Yunlong Gong, Binghong Jiang, Cheng Chang, Zhiheng Li, Xueqian Wang, et al.

导航 / SLAM / 自动驾驶
摘要

In highly interactive urban driving, decision making is often naturally multi-stage, and decisions at different stages can lead to different reactions from surrounding vehicles. This calls for stage-wise evaluation and selection. Tree-based planning naturally supports multi-stage search and evaluation, but fixed stage partitions can introduce discontinuities at stage boundaries, leading to piecewise and unnatural trajectories. To address these issues, we propose a diffusion-enhanced tree planning framework that refines a trajectory tree in continuous space and enables single-step inference by directly predicting the denoised trajectory. The diffusion model refines ego candidates and outputs candidate-conditioned agent predictions and human-likeness scores for ranking. We then model and evaluate interaction risk directly on the tree by seeding collisions and propagating risk bidirectionally with temporal decay, together with spatial risk sharing across clustered nodes. Finally, the optimal trajectory is selected by jointly considering risk, human-likeness, and progress. Experiments show that our method produces smoother and more natural trajectories. It also achieves a better balance between safety and efficiency in closed-loop evaluation.

RA-L 2026-05-12

UniTracker: Learning Universal Whole-Body Motion Tracker for Humanoid Robots

Kangning Yin, Weishuai Zeng, Ke Fan, Minyue Dai, Zirui Wang, Qiang Zhang, et al.

人形机器人
摘要

Achieving generalizable whole-body motion control is essential for deploying humanoid robots in real-world environments. However, existing MLP-based policies trained under partial observations often suffer from limited expressiveness and struggle to maintain global consistency. These shortcomings manifest as less expressive motion, orientation drift, and poor generalization across diverse behaviors. To address these limitations, we propose UniTracker, a three-stage framework for scalable and adaptive motion tracking. The first stage learns a privileged teacher policy that produces high-fidelity reference actions. Building on this, the second stage trains a CVAE-based universal policy that captures a global latent representation of motion, enabling robust performance under partial observations. Crucially, we align the partial-observation prior with a full-observation encoder, injecting global intent into the latent space. In the final stage, a lightweight adaptation module fine-tunes the student policy on challenging sequences, supporting both per-instance and batch adaptation. We validate UniTracker in simulation and on a Unitree G1 humanoid robot, demonstrating superior tracking accuracy, motion diversity, and deployment robustness compared to current baselines. Project page is available at https://yinkangning0124.github.io/Humanoid-UniTracker/

JFR 2026-04-16 · 被引 1

Soft Computing Techniques Applied to Adaptive Hybrid Navigation Methods for Tethered Robots in Dynamic Environments

Chandan Sheikder, Weimin Zhang, Xiaopeng Chen, Fangxing Li, Yichang Liu, Xiaohai He, et al.

人形机器人导航 / SLAM / 自动驾驶
摘要

Tethered robots face unique constraints: the physical tether can cause mission failure through snagging or excessive tension. Existing methods either require global maps (Tethered A*) or lack tether awareness (Bug algorithms). We present the first hybrid framework combining reactive obstacle avoidance with predictive tether management. Our system integrates (1) a tether‐aware cost function for 3–5 step snag prediction, (2) fuzzy logic for dynamic tension adaptation, and (3) genetic algorithms for tether routing optimization. Validation on a humanoid robot (ATLAS‐T) with instrumented sensors demonstrates 52% fewer snag incidents versus state‐of‐the‐art methods (Tethered A*, TSLAM), 94.7% success in tether‐critical scenarios, and 43% better odometry‐drift resilience. Real‐world experiments confirm practical viability.

RA-L 2026-05-11

Distributional Stability of Tangent-Linearized Gaussian Inference on Smooth Manifolds

Junghoon Seo, Hakjin Lee, Jaehoon Sim

控制与动力学
摘要

Gaussian inference on smooth manifolds is central to robotics, but exact marginalization and conditioning are generally non-Gaussian and geometry-dependent. We study tangent-linearized Gaussian inference and derive explicit non-asymptotic $W\_{2}$ stability bounds for projection marginalization and surface-measure conditioning. The bounds separate local second-order geometric distortion from nonlocal tail leakage and, for Gaussian inputs, yield closed-form diagnostics from $(\mu,\Sigma)$ and curvature/reach surrogates. Circle and planar-pushing experiments validate the predicted calibration transition near $\sqrt{\Vert \Sigma \Vert \_{\text{op}}}/R\approx 1/6$ and indicate that normal-direction uncertainty is the dominant failure mode when locality breaks. These diagnostics provide practical triggers for switching from single-chart linearization to multi-chart or sample-based manifold inference. Code and Jupyter notebooks are available at https://github.com/mikigom/StabilityTLGaussian .

RA-L 2026-05-11

TinyVBS: Design and Implementation of a Palm-Sized Pneumatic Variable Buoyancy System for 100-Meter-Depth Operation in Compact Marine Robot

Hexiong Zhou, Zheng Zeng, Lian Lian

控制与动力学
摘要

This paper presents the TinyVBS, a compact pneumatic variable buoyancy system designed for underwater robots operating at depths of up to 100 meters. The palm-sized device, weighing less than 500 grams, features a structurally simplified architecture, facilitating low-cost fabrication and effective compensation for depth-dependent elastic bladder deformation under hydrostatic pressure. By exploiting the inherent connection between bladder expansion volume and wall stress, the device obviates the need for flow or volumetric sensors. A novel pressure-adaptive control strategy dynamically modifies the behavior of the pneumatic components across variable environmental pressures. Laboratory hydrostatic environmental emulations exhibited steady closed-loop performance throughout five consecutive buoyancy adjustment cycles, achieving a total vertical movement of 700 m with a depth resolution of 0.5 m. Sea trials confirmed operational robustness to a maximum depth of 112 m, preserving functionality despite environmental disturbances including currents and thermal gradients. This advancement enables economical miniaturization of buoyancy control systems for coastal robotic applications with strict size and weight constraints.

RA-L 2026-05-11

Event-Triggered MPC With Linear Inter-Event Control for AV Path Tracking

Zhaodong Zhou, Jun Chen

控制与动力学
摘要

Model predictive control (MPC) is widely used for autonomous vehicle path tracking due to its ability to handle system constraints and optimize performance over a prediction horizon. However, frequent online optimization imposes high computational demands, making the real-time implementation of MPC challenging. Event-triggered MPC aims to solve this issue by updating control actions only when a predefined condition is met, but it executes precomputed control sequences in an open-loop fashion between events, potentially allowing errors to accumulate. This paper proposes an event-triggered MPC framework integrated with a linear inter-event control mechanism to address this limitation. The proposed inter-event controller applies a least-squares-based linear model to generate control inputs in real-time during inter-event periods, enabling continuous feedback corrections. Experimental results on a Quanser QCar2 platform demonstrate that the proposed approach improves tracking accuracy by 10% while significantly reducing the number of MPC optimizations compared to standard event-triggered MPC, offering an efficient solution for real-time path tracking problem.

RA-L 2026-05-11

Probabilistic Collision Risk Estimation Through Gauss-Legendre Cubature and Non-Homogeneous Poisson Processes

Trent Weiss, Madhur Behl

导航 / SLAM / 自动驾驶
摘要

Overtaking in high-speed autonomous racing demands precise, real-time estimation of collision risk; particularly in wheel-to-wheel scenarios where safety margins are minimal. Existing methods for collision risk estimation either rely on simplified geometric approximations, like bounding circles, or perform Monte Carlo sampling which leads to overly conservative motion planning behavior at racing speeds. We introduce the Gauss–Legendre Rectangle (GLR) algorithm, a principled two-stage integration method that estimates collision risk by combining Gauss–Legendre with a non-homogeneous Poisson process over time. GLR produces accurate risk estimates that account for vehicle geometry and trajectory uncertainty. In experiments across 446 overtaking scenarios in a high-fidelity Formula One racing simulation, GLR outperforms five state-of-the-art baselines achieving an average error reduction of $77\%$ and surpassing the next-best method by $52\%$ , all while running at 1000 Hz. The framework is general and applicable to broader motion planning contexts beyond autonomous racing.

RA-L 2026-05-11

ARTI-VIO: Asynchronous Multi-Camera Visual-Inertial Odometry With Feature Recall and Time-Interpolated Optimization

Yu Feng, Fanzhe Kong, Fuyong Wang

导航 / SLAM / 自动驾驶
摘要

Multi-camera visual-inertial odometry (VIO) provides robust state estimation across wide fields of view, but asynchronous sensor streams without hardware synchronization introduce algorithmic and computational challenges. We present ARTI-VIO, a flexible framework for asynchronous multi-camera VIO with self-regulating stream rate control. In the front-end, we integrate a feature recall mechanism into the standard pipeline. Instead of discarding lost features, we temporarily retain them and re-associates them with newly extracted points via descriptor matching, thereby improving inter-frame feature consistency. In the back-end, we adopt a unified time-interpolated optimization framework that models asynchronous poses as interpolated functions over a sparse set of anchor poses, enabling the direct incorporation of misaligned observations as geometric constraints in the optimization. Experiments on a custom challenging dataset demonstrate that ARTI-VIO maintains real-time performance across various camera configurations and achieves competitive accuracy and robustness compared with state-of-the-art baselines. The code is available at https://github.com/NKU-ISAN-LAB/ARTI-VIO .

RA-L 2026-04-29

A Reconfigured Wheel-Legged Robot for Enhanced Steering and Adaptability

Zhicheng Song, Jinglan Xu, Chunxin Zheng, Yulin Li, Zhihai Bi, Jun Ma

足式 / 四足机器人导航 / SLAM / 自动驾驶机器人学习
摘要

Wheel-legged robots integrate leg agility on rough terrain with wheel efficiency on flat ground. However, most existing designs do not fully capitalize on the benefits of both legged and wheeled structures, which limits overall system flexibility and efficiency. We present FLORES, a novel wheel-legged robot design featuring a distinctive front-leg configuration that sets it beyond standard design approaches. Specifically, FLORES replaces the conventional hip-roll degree of freedom (DoF) of the front leg with hip-yaw DoFs, and this allows for efficient movement on flat surfaces while ensuring adaptability when navigating complex terrains. This innovative design facilitates seamless transitions between different locomotion modes (i.e., legged locomotion and wheeled locomotion) and optimizes the performance across varied environments. To fully exploit \flores's mechanical capabilities, we develop a tailored reinforcement learning (RL) controller that adapts the Hybrid Internal Model (HIM) with a customized reward structure optimized for our unique mechanical configuration. This framework enables the generation of adaptive, multi-modal locomotion strategies that facilitate smooth transitions between wheeled and legged movements. Furthermore, our distinctive joint design enables the robot to exhibit novel and highly efficient locomotion gaits that capitalize on the synergistic advantages of both locomotion modes. Through comprehensive experiments, we demonstrate FLORES's enhanced steering capabilities, improved navigation efficiency, and versatile locomotion across various terrains. The open-source project can be found at https://github.com/ZhichengSong6/FLORES.

JFR 2026-05-20

QRIVAS: Quadruped Robot‐Based Intelligent Visual Acquisition System for Bridge Component Inspection

Yuxuan Li, Linlong Meng, Liangjing Yang, Yuki Nishimura, Weilei Yu, Hengda Hu, et al.

足式 / 四足机器人导航 / SLAM / 自动驾驶感知与传感
摘要

Bridge inspection constitutes a critical yet labor‐intensive task in civil infrastructure maintenance, often requiring access to confined, structurally complex environments. Conventional manual inspection suffers from low efficiency and high operational risks, and the robotic solutions encounter limitations in GNSS‐denied and low illumination environments with texture‐deficient surfaces. This study proposes QRIVAS (quadruped robot based intelligent visual acquisition system), an autonomous framework for structural component image acquisition without relying on prior maps to reduce the workload for manual close‐proximity inspection. QRIVAS integrates 3D LiDAR SLAM with real‐time semantic segmentation, enabling reliable navigation and precise structural component identification. In this paper, we focus on the exploration and inspection of bridge column—a representative and critical structural component of bridge systems. Experimental validation across simulated concrete railway viaducts and physical laboratory‐scale bridge models (1:3 scale) shows that QRIVAS achieved 100% navigation success rate in simulation environments and 96.7% average task navigation success rate across six bridge columns in laboratory‐scale bridge specimen. Compared to existing research, QRIVAS shows consistent performance improvements across varying tolerance conditions (25 cm and 50 cm radius), maintaining robust operation under both flat concrete floor and rough artificial grass terrain conditions. This work demonstrates the potential of AI‐driven robotic systems to transform traditional infrastructure maintenance practices.

T-RO 2026-04-21

Extreme High-Gain Friction Observer of Flexible Joint Robots With $\mathcal {L}_{1}$ Adaptive Framework

Young Bin Lee, Tae Ho Yun, Min Jun Kim

摘要

Compliance control enables flexible joint robots (FJRs) to interact with unknown environments, but joint friction may significantly degrade control performance and backdrivability. While several model-free friction observers for FJRs have been studied in recent decades, current approaches still face challenges when the robot interacts with stiff environments. To tackle this, this paper proposes a new friction observer based on an $\mathcal {L}_{1}$ adaptive framework. The main advantage of the proposed method is that it overcomes a fundamental trade-off in the state-of-the-art (SOTA) method between accurate friction compensation and natural environmental interactions. Moreover, the proposed approach enables the use of extremely high gains, which yield several additional benefits. First, unlike the conventional methods, which require feedback of so-called nominal signals obtained through simulation, measured motor signals can be fed back into the controllers, leading to a simpler implementation. Second, we provide performance analysis showing that increasing the gain improves performance and results in near-zero steady-state error. Third, the observer's performance can be adjusted using only a single parameter. Lastly, the numerical issues arising from extremely high gains are alleviated by employing a stable numerical method. The above theoretical findings are validated through simulations, and the effectiveness of the proposed approach is further evaluated with real-world experiments using both single- and 7-joint FJR systems. The results demonstrate that the proposed approach enables robots to interact with stiff environments more naturally, while achieving enhanced friction compensation performance.

RA-L 2026-05-15

Dual Reactive Planning for Heterogeneous Robots With Evolving Capabilities in Unknown Environments

Zhangli Zhou, Hao Li, Zhen Kan

摘要

Heterogeneous robot teams executing Linear Temporal Logic (LTL) missions are usually modeled with fixed robot capabilities. In practice, capabilities may change during execution through tool acquisition, sensor activation, or module reconfiguration, making previously infeasible tasks executable and altering the remaining allocation. We present Dual Reactive Planning (DRP), a dual-reactive execution architecture for online LTL execution in initially unknown environments without assuming a complete prior map. DRP combines a shared automaton-based progress monitor with two replanning layers: local heuristic reassignment for environmental discoveries that change path costs, and mission-level reallocation for capability acquisitions that change task executability. The formulation targets missions whose allocation-relevant structure is specified through declared precedence relations and capability transitions. We formalize capability evolution through deterministic transition functions triggered by task completion and provide a scoped termination argument and complexity analysis. Comparative studies, targeted 2–4 robot capability-evolution benchmarks, and Gazebo experiments show that DRP solves all tested scenarios, whereas a capability-agnostic reactive baseline does not. Accounting for capability change during execution improves both mission completion and efficiency in the evaluated settings.

RA-L 2026-04-27

DockAnywhere: Data-Efficient Visuomotor Policy Learning for Mobile Manipulation via Novel Demonstration Generation

Ziyu Shan, Yuheng Zhou, Gaoyuan Wu, Ziheng Ji, Zhenyu Wu, Ziwei Wang

操作与机械臂机器人学习感知与传感
摘要

Mobile manipulation is a fundamental capability that enables robots to interact in expansive environments such as homes and factories. Most existing approaches follow a two-stage paradigm, where the robot first navigates to a docking point and then performs fixed-base manipulation using powerful visuomotor policies. However, real-world mobile manipulation often suffers from the view generalization problem due to shifts of docking points. To address this issue, we propose a novel low-cost demonstration generation framework named DockAnywhere , which improves viewpoint generalization under docking variability by lifting a single demonstration to diverse feasible docking configurations. Specifically, DockAnywhere lifts a trajectory to any feasible docking points by decoupling docking-dependent base motions from contact-rich manipulation skills that remain invariant across viewpoints. Feasible docking proposals are sampled under feasibility constraints, and corresponding trajectories are generated via structure-preserving augmentation. Visual observations are synthesized in 3D space by representing the robot and objects as point clouds and applying point-level spatial editing to ensure the consistency of observation and action across viewpoints. Extensive experiments on ManiSkill and real-world platforms demonstrate that DockAnywhere substantially improves policy success rates and easily generalizes to novel viewpoints from unseen docking points during training, significantly enhancing the generalization capability of mobile manipulation policy in real-world deployment.

RA-L 2026-04-27

MyoDEA: A Self-Sensing Dielectric Elastomer Actuator for Real-Time Muscle Stiffness Monitoring

Seoyeon Ham, Liujun Xu, Siyi Xu

感知与传感医疗 / 软体 / 微纳人机交互 / 遥操作
摘要

Monitoring muscle stiffness in real-time during dynamic muscle activity is critical for rehabilitation, athletic training, and human-robot interaction. Existing technologies often require bulky hardware and complex multi-component systems, limiting reliable stiffness measurement in daily activities. In this work, we introduce MyoDEA, a centimeter-scale and 770mg device based on a novel self-sensing dielectric elastomer actuator (DEA) for time-varying muscle stiffness detection. This DEA generates a 2.5 times higher blocked force than previously reported power-dense DEAs at operating frequencies below 10 Hz, while maintaining comparable bandwidth and energy output at resonance. A new self-sensing method is also developed to enable integrated actuation and sensing within a single DEA. The MyoDEA achieves continuous muscle stiffness tracking at 10 Hz over a range of 30 kPa to 140 kPa. Human-subject experiments with MyoDEA worn on the forearm over the flexor digitorum superficialis (FDS) muscle belly demonstrate reliable continuous stiffness tracking during isometric gripping across 0%-80% maximum voluntary contraction (%MVC).

RA-L 2026-04-27

RADRON: Cooperative Localization of Ionizing Radiation Sources by MAVs With Compton Cameras

Petr Stibinger, Tomas Baca, Daniela Doubravova, Jan Rusnak, Jaroslav Solc, Jan Jakubek, et al.

无人机 / 空中机器人导航 / SLAM / 自动驾驶感知与传感多机器人 / 集群
摘要

We present a novel approach to localizing radioac- tive material by cooperating Micro Aerial Vehicles (MAVs). Our approach utilizes a state-of-the-art single-detector Compton camera as a highly sensitive, yet miniature detector of ionizing radiation. The detector's exceptionally low weight (40 g) opens up new possibilities of radiation detection by a team of cooperating agile MAVs. We propose a new fundamental concept of fusing the Compton camera measurements to estimate the position of the radiation source in real time even from extremely sparse measurements. The data readout and processing are performed directly onboard and the results are used in a dynamic feedback to drive the motion of the vehicles. The MAVs are stabilized in a tightly cooperating swarm to maximize the information gained by the Compton cameras, rapidly locate the radiation source, and even track a moving radiation source.

RA-L 2026-04-27

Differentiable Inverse Graphics for Zero-Shot Scene Reconstruction and Robot Grasping

Octavio Arriaga, Proneet Sharma, Jichen Guo, Marc Otto, Siddhant Kadwe, Rebecca Adam

操作与机械臂机器人学习感知与传感
摘要

Operating effectively in novel real-world environments requires robotic systems to estimate and interact with previously unseen objects. Current state-of-the-art models address this challenge by using large amounts of training data and test-time samples to build black-box scene representations. In this work, we introduce a differentiable neuro-graphics model that combines neural foundation models with physics-based differentiable rendering to perform zero-shot scene reconstruction and robot grasping without relying on any additional 3D data or test-time samples. Our model solves a series of constrained optimization problems to estimate physically consistent scene parameters, such as meshes, lighting conditions, material properties, and 6D poses of previously unseen objects from a single RGBD image and bounding boxes. We evaluated our approach on standard model-free few-shot benchmarks and demonstrated that it outperforms existing algorithms for model-free few-shot pose estimation. Furthermore, we validated the accuracy of our scene reconstructions by applying our algorithm to a zero-shot grasping task. By enabling zero-shot, physically-consistent scene reconstruction and grasping without reliance on extensive datasets or test-time sampling, our approach offers a pathway toward more data efficient, interpretable and generalizable robot autonomy in novel environments. Code is available at oarriaga.com.

RA-L 2026-04-27

Design of an Active Haptic Interface Using Proprioception Feedback for Continuous Endovascular Teleoperation

Ziyang Mei, Siyuan Han, Shengqian Qu, Ning Wang, Youchang Xia, Zitong Liao, et al.

导航 / SLAM / 自动驾驶机器人学习医疗 / 软体 / 微纳人机交互 / 遥操作控制与动力学
摘要

Force feedback is essential for safe endovascular teleoperation, yet typically constrained by complex sensor integration. This paper presents a compact active haptic interface system designed for robotic catheterization. Leveraging the intrinsic proprioception of a Permanent Magnet Synchronous Motor, the integrated design enables proprioception proximal force measurement, simplifying mechanical structure while ensuring clinical deployability. To refine haptic fidelity, a hybrid estimation framework combines physics-based dynamics with a neural network to compensate for residual errors, optimizing the reconstruction of proximal interaction forces. The interface supports a continuous rate-control strategy, mapping estimated resistance into real-time active feedback. System experiments and user studies have demonstrated a delay of less than 5 ms and a estimation accuracy of 0.1 N. The proposed solution significantly reduces peak interaction forces and enhances obstacle awareness, offering a practical, cost-effective system-level solution for robotic vascular interventions.

RA-L 2026-04-27

Learning Dynamic Pick-and-Place for a Legged Manipulator

Moonkyu Jung, Jiseong Lee, Zhengmao He, Donghoon Youm, Juhyeok Mun, HyeongJun Kim, et al.

人形机器人足式 / 四足机器人操作与机械臂机器人学习
摘要

Legged manipulators extend robotic capabilities beyond static manipulation by integrating agile locomotion with versatile arm control. However, achieving precise manipulation while maintaining coordinated locomotion remains a major challenge. This work presents a hierarchical reinforcement learning framework for dynamic pick-and-place tasks using a quadruped equipped with a 6-DOF robotic arm. The framework incorporates an explicit mass estimation module enabling adaptive whole-body control for objects with varying weights. In simulation, the system achieves an 86.05% success rate with payloads up to 2.3 kg. The approach is further validated through real-world experiments across six representative scenarios with controlled variations in object physical properties (size and mass) and task heights. Specifically, within a wide vertical workspace ranging from ground level to 1.1 m-high tabletops, the system demonstrates an average success rate of 73.3% for payloads up to 1.3 kg, with an average execution time of 4.06 s. Unlike prior works that handle lightweight objects and execute pick-and-place motions with slow, piecewise motions, the proposed framework exploits concurrent locomotion and manipulation for dynamic, continuous execution. These results demonstrate the potential of quadrupedal mobile manipulators for adaptive, whole-body pick-and-place with heavier payloads and extended workspaces.

RA-L 2026-04-27

TREND: Task-Oriented World Models for Visual Robotic Manipulation

Yuxiang Zheng, Tao Lu, Yinghao Cai

操作与机械臂机器人学习控制与动力学
摘要

World models, the key component of model-based reinforcement learning (MBRL), enable sample-efficient learning by modeling the environment. However, for challenging visual robotic manipulation tasks with continuous action spaces, standard image reconstruction and uniform sampling schemes are suboptimal and may even impede learning. To address this, we propose TREND, a task-oriented world model that emphasizes robotic task-critical features and leverages motion-informative samples to accelerate environment modeling, thereby improving sample efficiency and policy convergence. Our method consists of two components: (1) Inter-Frame Difference Capture (IFDC): This mechanism replaces traditional image reconstruction and performs global and local difference capture to emphasize robotic task-critical regions, orienting the world model to reconstruct task-relevant features and facilitating task completion. (2) Motion-Density-guided Experience Replay (MDER): Instead of uniform sampling, this scheme employs kernel density estimation of end-effector positions to prioritize motion-informative trajectory samples, thereby enabling the world model to rapidly learn the dynamics of the robotic environment. Our experiments show that TREND significantly outperforms six other existing methods in terms of sample efficiency and success rate on ten simulated and five real-world visual robotic manipulation tasks.

JFR 2026-05-18

Fuzzy Based Control Strategy and Dynamic Torque Adjustment in a Four Wheeled Coconut Tree Climber

Shree Rajesh Raagul Vadivel, Sakthiprasad Kuttankulangara Manoharan, Rajesh Kannan Megalingam, Praseeja Parakat, Brindha Shaju

足式 / 四足机器人导航 / SLAM / 自动驾驶控制与动力学
摘要

Inherent dangers and limitations of manual labor led to the development of coconut tree climbing robots as a safer and more efficient coconut farming method. Coconut tree climbing robots face significant challenges due to structural constraints, control complexities, and the unique dimensions of coconut trees. Limited research exists on effective designs, with many struggling to balance accuracy, speed, and adaptability. The study presents a novel control law tailored to critical climbing scenarios, enhancing both stability and efficiency. The proposed method ensures reliable ascent and descent by dynamically adapting to challenges such as tree inclines and varying diameters, switching control logic based on situational demands. Key climbing scenarios, including normal cases, wheel‐out‐of‐contact situations, and wheel‐at‐wedge cases, were considered. A fuzzy inference system was integrated to further refine the control strategy, dynamically adjusting torque based on real‐time parameters like inclination, current, encoder values, and diameter variations, thereby optimizing performance across diverse conditions. Detailed static and dynamic analyses shaped the development of a four‐wheeled climber, equipped with gas springs to maximize traction. Simulations validated the proposed control law, achieving a steady state climbing velocity of 0.2 m/s and a displacement of 4 m. Observed transient oscillations and initial peak angular velocities, ranging from to 0.5 rad/s, underscore the importance of accounting for real‐world dynamics. The climber was tested in three different scenarios and achieved a 96.67% overall success rate, completing 29 out of 30 trials. It performed flawlessly in two conditions and had a single failure on a straight tree with varying diameter.

RA-L 2026-05-11

A Virtual Mechanical Interaction Layer Enables Resilient Human-to-Robot Object Handovers

Omar Faris, Sławomir K. Tadeja, Fulvio Forni

摘要

Object handover is a common form of interaction that is widely present in collaborative tasks. However, achieving it efficiently remains a challenge. We address the problem of ensuring resilient robotic actions that can adapt to complex changes in object pose during human-to-robot object handovers. We propose the use of Virtual Model Control to create an interaction layer that controls the robot and adapts to the dynamic changes in the handover process. Additionally, we propose the use of augmented reality to facilitate bidirectional communication between humans and robots during handovers. We assess the performance of our controller in a set of experiments that demonstrate its resilience to various sources of uncertainties, including complex changes to the object's pose during the handover. Finally, we performed a user study with 16 participants to understand human preferences for different robot control profiles and augmented reality visuals in object handovers. Our results showed a general preference for the proposed approach and revealed insights that can guide further development in adapting the interaction with the user.

IJRR 2026-04-01

Robot-assisted navigation with adaptive impedance and path planning for visually occluded users

Idil Ozdamar, Doganay Sirintuna, Pietro Balatti, Luca Fortini, Younggeol Cho, Francisco J. Ruiz-Ruiz, et al.

人形机器人操作与机械臂导航 / SLAM / 自动驾驶感知与传感
摘要

Considering the number of visually impaired individuals worldwide and the limited availability of assistance for safe navigation, there is a growing need for intelligent robotic systems that can provide reliable guidance. This manuscript presents an adaptive 3D human guidance approach using a mobile manipulator to assist visually impaired users as they navigate unstructured spaces. The proposed system generates and executes a collision-free path toward a designated goal while avoiding both static and dynamic obstacles at varying heights for the entire system, which comprises of an omnidirectional mobile base, a torque-controlled robotic arm, and the physically coupled user interacting with the arm’s end-effector. To ensure the full-body safety of the user, we introduce a leg tracking algorithm based on 2D LiDAR sensors integrated into the mobile base, along with a human height estimation algorithm that uses an RGB-D camera. Based on the real-time tracking of the human state, the proposed approach provides responsive guidance: Horizontal Pulling adjusts the robot’s lateral motion and base velocity according to the user’s leg position when deviations from the collision-free trajectory occur, while Vertical Pulling modulates the arm’s height to prevent upper-body collisions, based on estimated user height and the obstacle information from an attached 3D LiDAR. Additionally, an Impedance Tuning algorithm is designed to dynamically adjust impedance parameters in coordination with the pulling force based on the estimated collision risk level. The effectiveness of the proposed approach is demonstrated through extensive multi-subject user studies in controlled environments and a proof-of-concept deployment in a real-world scenario.

JFR 2026-05-26

Dynamic Environment Adaptive Path Planning for Mobile Robots: A Hybrid Enhanced Path‐Planning Approach

Junxu Hou, Hong Wang, Eryi Dong, Tao Wang, Fengkai Kang, Boyan Jiang

导航 / SLAM / 自动驾驶
摘要

With the rapid advancement of mobile robotics, the demand for safe and efficient path planning has become increasingly prominent. This research aims to address the challenges of path redundancy and the lack of stable obstacle avoidance strategies encountered by mobile robots during path planning in dynamic environments. A novel hybrid enhanced path‐planning algorithm is proposed, which integrates obstacle information, robot safety data, path‐simplification techniques, and dynamic‐obstacle avoidance strategies. By combining global and local path planning and introducing a path evaluation system based on robot status and dynamic‐obstacle motion information, the algorithm achieves efficient and flexible path planning. The effectiveness and feasibility of the algorithm are validated through simulation experiments and real‐world testing. The results demonstrate that the algorithm can achieve safe and efficient path planning under different speeds and dynamic‐obstacle scenarios, significantly reducing the frequency of path switching and waiting times, thus enhancing the efficiency and autonomy of the robot's actual scene navigation. In the experimental scene, the overall travel time is saved by 8.3%, the traffic obstacle area time is saved by 48.9%, and the obstacle avoidance route is stable and efficient. Compared with the current advanced algorithm, the proposed algorithm can improve the driving efficiency of the robot by more than 12%, and the robot's behaviour is more secure in the face of dynamic obstacles. This research provides an effective solution for mobile robot path planning in complex environments, with significant theoretical and practical implications.

RA-L 2026-04-28

MacroNav: Multi-Task Context Representation Learning Enables Efficient Navigation in Unknown Environments

Kuankuan Sima, Longbin Tang, Zhenyu Yang, Haozhe Ma, Lin Zhao

导航 / SLAM / 自动驾驶机器人学习
摘要

Autonomous navigation in unknown environments requires multi-scale spatial understanding that captures geometric details, topological connectivity, and global structure to support high-level decision making under partial observability. Existing approaches struggle to efficiently capture such multi-scale spatial understanding while maintaining low computational cost for real-time navigation. We present MacroNav, a learning-based navigation framework featuring two key components: (1) a lightweight context encoder trained via multi-task self-supervised learning to capture multi-scale, navigation-centric spatial representations; and (2) a reinforcement learning policy that seamlessly integrates these representations with graph-based reasoning for efficient action selection. Extensive experiments demonstrate the context encoder's effective and robust environmental understanding. Real-world deployments further validate MacroNav's effectiveness, yielding significant gains over state-of-the-art navigation methods in both Success Rate (SR) and Success weighted by Path Length (SPL), with superior computational efficiency.

RA-L 2026-04-22

AISPO: Enhancing Depth Reliability for Robotic Manipulation of Non-Lambertian Objects via Affine-Invariant Shape Prior

Zhiming Chen, Linfang Zheng, Kun Zhang, Hyung Jin Chang, Wei Zhang, Hongyu Yu, et al.

操作与机械臂导航 / SLAM / 自动驾驶感知与传感
摘要

Reliable depth perception is critical for robotic manipulation, especially for non-Lambertian objects such as transparent or highly specular surfaces, where raw depth measurements are often corrupted or missing. These failures frequently propagate to motion planning, resulting in invalid grasp poses and execution errors. We propose AISPO, a depth completion framework that improves depth reliability for manipulation in challenging sensing conditions. AISPO combines multi-scale RGB-D feature fusion with an affine-invariant shape prior to enforce geometric consistency and mitigate catastrophic depth failures. Unlike methods that focus primarily on average depth accuracy, our approach emphasizes physical plausibility and structural integrity of the predicted depth maps. Extensive benchmark evaluations demonstrate competitive performance and strong generalization to unseen objects and novel scenes. Real-world grasping experiments further show that enhanced depth reliability significantly improves manipulation success rates, particularly for transparent objects where many existing methods fail to produce physically usable depth estimates.

RA-L 2026-04-22

Online Learning-Enhanced High Order Adaptive Safety Control

Lishuo Pan, Mattia Catellani, Thales C. Silva, Lorenzo Sabattini, Nora Ayanian

无人机 / 空中机器人机器人学习控制与动力学
摘要

Control barrier functions (CBFs) are an effective model-based tool to formally certify the safety of a system. With the growing complexity of modern control problems, CBFs have received increasing attention in both optimization-based and learning-based control communities as a safety filter, owing to their provable guarantees. However, success in transferring these guarantees to real-world systems is critically tied to model accuracy. For example, payloads or wind disturbances can significantly influence the dynamics of an aerial vehicle and invalidate the safety guarantee. In this work, we propose an efficient yet flexible online learning-enhanced high-order adaptive control barrier function using Neural ODEs. Our approach improves the safety of a CBF controller on the fly, even under complex time-varying model perturbations. In particular, we deploy our hybrid adaptive CBF controller on a 38g nano quadrotor, keeping a safe distance from the obstacle, against 18km/h wind.

RA-L 2026-04-22

Non-Intrusive Contactless Gesture Recognition for Human-Robot Interaction

Dimitrios Tsiakmakis, Massimiliano Iosi, Gastone Ciuti

机器人学习感知与传感医疗 / 软体 / 微纳人机交互 / 遥操作
摘要

As robots increasingly integrate into collaborative and assistive environments, natural gesture interpretation becomes crucial for effective human-robot interaction. While vision-based gesture recognition systems face privacy concerns and performance limitations under occlusion and varying lighting conditions, wearable sensor approaches often compromise user comfort and interaction naturalness. These challenges necessitate alternative approaches that balance recognition accuracy with user acceptance in real-world deployment scenarios. This work proposes a lightweight transformer-based feature extractor where test-time augmentation with majority voting is employed during inference to enhance classification performance. With approximately 2M trainable parameters, the network maintains reasonable efficiency on modern acceleration hardware. Its modular architecture also indicates an easy adaptation to a variety of sensor configurations. It reached an overall accuracy of 97.21%, with similarly high scores for f1-score, precision, and recall. The pipeline's inference duration of 16 ms on GPU and 40 ms on CPU demonstrate its suitability for real-time applications in collaborative robotics.

RA-L 2026-04-22

Bio-Inspired Flower Robot for Emotional Human-Robot Interaction

Qingyue Liu, Guojie Xu, Canrui Yang, Xunsheng Du, Igarashi Go, Hongqiang Wang

机器人学习感知与传感医疗 / 软体 / 微纳人机交互 / 遥操作
摘要

Mental health problems are common worldwide, but the available support systems and healthcare services are insufficient and inadequate. Social robots offer a promising solution to achieve social interaction and emotional connection. Nature-inspired architectural robotics can seamlessly blend into human daily life by reshaping future living environments. In this work, inspired by natural flowers, we designed a flower robot with a realistic morphology that simulates the opening-closing, rhythmic movements, and stem-bending movement. Building on the morphological design, we integrated a flex sensor into the petals to enable touch-induced tactile feedback, mimicking the touch-induced plant-like seismonastic movement. Furthermore, we incorporated voice interaction powered by a local large language model (LLM) to achieve adaptive intent understanding and personalized semantic engagement, thereby enhancing interaction flexibility compared to the repetitive tasks and rigid protocols of traditional social robots. Finally, the human-robot interaction (HRI) experiment demonstrated that the flower robot significantly reduced participants' negative emotions and provided positive social engagement. This work developed a bio-inspired flower robot, offering a promising pathway to social interaction and soothing presence.

RA-L 2026-04-22

ADAPT: Advanced Differential and Agile Parallel Transmission Mechanism for High-Performance Humanoid Robots

Yun-Ho Han, Min-Ho Park, Baek-Kyu Cho

人形机器人足式 / 四足机器人导航 / SLAM / 自动驾驶
摘要

Agile humanoid locomotion requires high joint torque and velocity, but conventional serial actuation, whether using high-reduction or Quasi-Direct Drive (QDD) actuators, faces limitations in size, weight, and efficiency. This paper proposes the Advanced Differential and Agile Parallel Transmission (ADAPT), a novel leg architecture that efficiently utilizes commercial actuators. Specifically, ADAPT integrates a Hip Pitch- Knee Differential Mechanism with a Knee-Ankle Parallel Transmission. This mechanism mechanically couples multiple actuators to generate output exceeding a single actuator's capability, while concentrating the leg's mass near the hip to minimize inertia. We validated the proposed design on a self-developed single-leg platform using a nonlinear kino-dynamic trajectory optimization framework. Experimental results from dynamic tasks including soccer kicking and vertical jumping demonstrate that the robot generates joint torques significantly exceeding the capability of a single actuator. This validates that the proposed architecture effectively overcomes the limitations of serial actuation, enabling high-performance dynamic behaviors efficiently.

RA-L 2026-04-22

VLAD-Grasp: Vision-Language Adaptive Depth Grasping for Part-Specific Manipulation

Meng Liu, Yanli Zou, Hao Wang, Shiquan Qiu, Fan Lei

操作与机械臂感知与传感控制与动力学
摘要

In household service robot grasping tasks, precisely identifying and grasping specific object parts are essential for ensuring safety and execution efficiency. Although existing Vision-Language Models (VLMs) support open-vocabulary understanding, they still struggle with fine-grained semantic instructions such as “grasp the handle of a cup” or “hold the blade when passing a knife.” They also often fail to identify the correct object when multiple identical items appear in the scene. Conventional 6-DoF grasping approaches also suffer from unstable grasp depth and high collision risk. To overcome these challenges, we construct a new vision-language dataset, OVPGrasping, tailored for fine-grained, task-oriented grasping in household scenarios. We further propose two key modules: a Bidirectional Cross-Modal Enhancement Module (BCEM) that enables mutual enhancement between image and text features, and a Gated Cross-Modal Fusion (GCMF) module that selectively reinforces decoder queries with object-level semantic embeddings. In addition, we design an adaptive 6-DoF depth-aware grasping strategy that improves grasp stability and reduces collision probability. On the OVPGrasping dataset, our method improves detection accuracy by 1.67% over the perception Baseline. In real-robot experiments, the single-object success rate increases from 70.83% to 78.33%, and the cluttered-scene success rate rises from 64% to 72%, validating the effectiveness of our fine-grained semantic understanding and adaptive grasping strategy.

IJRR 2026-03-31

Can familiarization with a supernumerary robotic finger restore task-dependent cortico-autonomic coupling in first-time healthy users?

Feryal A. Alskafi, Mohammad I. Awad, Faezeh Marzbanrad, Rateb Katmah, Sona Al Younis, Ahsan H. Khandoker, et al.

操作与机械臂医疗 / 软体 / 微纳控制与动力学
摘要

Supernumerary Robotic Fingers (SRFs) have proven their capabilities in enhancing manipulation and dexterity in precision tasks enabling multitasking in healthy individuals, while also playing a compensatory role for those with motor impairments. However, the literature on SRF adaptation and embodiment remains scarce, particularly in terms of the neurophysiological basis underlying their introduction, use, and assimilation. This study employed a controlled-time-delay-stability (CTDS) framework to investigate physiological embodiment by capturing multimodal signals (encephalography (EEG), electrodermal-activity (EDA), photoplethysmography (PPG), and respiratory) in 30 adults performing three activities-of-daily-living. Tasks were completed across three phases: without SRF, with SRF immediately after attachment, and with SRF following individualized training. CTDS quantified pairwise strength and directionality of dynamic interactions while controlling for indirect effects. Network visualization provided a holistic view of brain–body reconfiguration with SRF use. Resting baselines showed no significant differences across phases, suggesting SRF attachment alone did not introduce sufficient stress to alter brain–body interactions. During tasks, brain–body interactions exhibited task-dependent modulation. This modulation was suppressed with initial SRF attachment but reappeared after training/familiarization, with post-training patterns becoming statistically indistinguishable from those without the SRF. These results indicate that SRFs can be effectively embodied into the body schema after targeted training. The transient plasticity observed in first-time users may serve as a biomarker for customizing SRF training and tracking neurorehabilitation progress. Extending prior findings of rapid cortical adaptation, this study shows that brain–body networks reconfigure in a task-specific manner toward assimilation following brief familiarization. Future work warrants clinical validation and translation.

IJRR 2026-03-31

Situationally-aware dynamics learning

Alejandro Murillo-González, Lantao Liu

无人机 / 空中机器人操作与机械臂导航 / SLAM / 自动驾驶感知与传感控制与动力学
摘要

Autonomous robots operating in complex, unstructured environments face significant challenges due to latent, unobserved factors that obscure their understanding of both their internal state and the external world. Addressing this challenge would enable robots to develop a more profound grasp of their operational context. To tackle this, we propose a novel framework for online learning of hidden state representations, with which the robots can adapt in real time to uncertain and dynamic conditions that would otherwise be ambiguous and result in suboptimal or erroneous behaviors. Our approach is formalized as a Generalized Hidden Parameter Markov Decision Process, which explicitly models the influence of unobserved parameters on both transition dynamics and reward structures. Our core innovation lies in learning online the joint distribution of state transitions, which serves as an expressive representation of latent ego- and environmental-factors. This probabilistic approach supports the identification and adaptation to different operational situations, improving robustness and safety. Through a multivariate extension of Bayesian Online Changepoint Detection, our method segments changes in the underlying data generating process governing the robot’s dynamics. The robot’s transition model is then informed with a symbolic representation of the current situation derived from the joint distribution of latest state transitions, enabling adaptive and context-aware decision-making. To demonstrate effectiveness, we validate our approach on an unmanned ground vehicle operating in diverse unstructured terrains, both in simulation and in real-world experiments. We also evaluate a quadrotor in simulation under randomly changing wind conditions. Both setups introduce unmodeled and unmeasured environmental factors that substantially affect robot motion. Extensive experiments in both simulation and real world reveal significant improvements in data efficiency, policy performance, and the emergence of safer, adaptive navigation strategies. Website: https://alejandromllo.github.io/research/situational-awareness/ .

RA-L 2026-04-27

Sim2Real Through Approximate Information States

Yunfu Deng, Yuhao Li, Josiah P. Hanna

机器人学习控制与动力学
摘要

In recent years, reinforcement learning (RL) has shown remarkable success in robotics when a fast and accurate simulator is available for a given task. When using RL and simulation, more simulator realism is generally beneficial but becomes harder to obtain as robots are deployed in increasingly complex and widescale domains. In such settings, simulators will likely fail to model all relevant details of a given target task and this observation motivates the study of sim2real with simulators that leave out key task details. In this paper, we formalize and study the abstract sim2real problem: given an abstract simulator that models a target task at a coarse level of abstraction, how can we train a policy with RL in the abstract simulator and successfully transfer it to the real-world? Our first contribution is to formalize this problem using the language of state abstraction from the RL literature. This framing shows that an abstract simulator can be grounded to match the target task if the grounded abstract dynamics take the history of states into account. Based on the formalism, we then introduce a method that uses real-world task data to correct the dynamics of the abstract simulator. We then show that this method enables successful policy transfer both in sim2sim and sim2real evaluation.

RA-L 2026-04-27

SPREAD: Scalable Pre-Trained World Model for Adaptive Dynamics Model

Jihun Moon, Seong-Woo Kim

机器人学习控制与动力学
摘要

Autonomous robots must be capable of adapting not only to training datasets but also to unfamiliar environments. World models, which learn predictive dynamics of the environment, have been proposed to overcome the task-specific limitations of conventional RL. However, their capability is typically restricted to training distributions. In this paper, we introduce SPREAD, the first framework that enables a pre-trained world model to learn environmental changes online, and continuously. SPREAD freezes the pre-trained model and updates only a set of LoRA adapters. This design jointly achieves three goals: (i) real-time adaptation, (ii) robustness to forgetting without a replay buffer, and (iii) forward transfer. Experiments demonstrate that SPREAD reduces prediction error by 40% within just 100 steps under diverse visual and dynamic changes, and that such adaptation directly translates into improved task success rates. Moreover, SPREAD validates its effectiveness in continual learning settings. By extending pre-trained world models to the domain of online continual learning, this work highlights world models that autonomously adapt to real-world dynamics.

RA-L 2026-04-27

Teacher-Guided Terrain-Aware Learning for Recovery of Quadruped Robots

Boyuan Deng, Xu Yang, Yilin Mo, Nikolaos Tsagarakis

足式 / 四足机器人控制与动力学
摘要

Autonomous post-fall recovery is a key bottleneck for long-duration deployment of legged robots, yet prevailing methods depend on flat-ground target poses and heuristic rewards and thus fail to generalize under irregular terrain and random initial states. We introduce a teacher-guided, terrain-aware recovery framework, which relies solely on proprioception to drive a quadruped from arbitrary fallen configurations on uneven ground to an optimal posture. We formally define the terrain-adapted optimal recovery posture by formulating a kinematic–dynamic objective and evaluate outcomes using a unified force–angle stability margin. Building on this, a Teacher–Student PPO scheme distills the teacher's privileged terrain knowledge into a deployable student using imitation losses over short proprioceptive histories. Extensive simulation and hardware trials show a recovery success rate of 90.26% across diverse rough terrains and randomized states, with the teacher at 93.47% and the baseline at 78.6%, while stability margins remain positive in all successful trials, indicating reliable static stability of the final posture. The policy maintains robustness in extreme initial conditions and low-friction settings through repeated rolling and rocking. These results close gaps in the formalization of target postures and in evaluation standards for non-flat recovery and deliver a general, deployable proprioception-only paradigm for fall recovery on complex terrain. The supplementary material is available at https://boyuandeng.github.io/TAFR-TeacherGuidedFramework/ .

RA-L 2026-04-27

Friction Modeling of Tendon-Driven Continuum Robots Through Linear Complementarity Problem

Jia Shen, Brendan Browne, Junhyoung Ha, Yue Chen

导航 / SLAM / 自动驾驶医疗 / 软体 / 微纳
摘要

Tendon-driven continuum robots (TDCR) are widely used in medical interventions due to their inherent dexterity and compliance. However, precise motion planning and control of these robots remain challenging, largely because existing models do not accurately capture tendon frictional hysteresis. Predicting this friction-induced hysteresis has been recognized as an open problem due to the complex physics of tendon-disk interactions. In this letter, we propose a novel friction modeling approach for TDCR by incorporating the Capstan friction model as a set of complementarity constraints. Unlike conventional approaches that only predict the bounding effect of sliding friction, the proposed model captures the continuous change in tendon-disk friction forces and the transition between tendon sticking and sliding, thereby enabling the prediction of friction-induced hysteresis. The friction model is further formulated as the complementarity problem for convenient numerical implementation. We experimentally validated our approach on a simplified TDCR prototype and demonstrated a reduction of tip position error from 36.11 mm in conventional sliding friction models to 9.42 mm for a 402-mm-long robot. The source code is available at https://github.com/Jia0Shen/Friction-TDCR-Tension.

RA-L 2026-04-27

A Hybrid Perching Mechanism for Aerial Robots: Branch and Surface Perching Robust to Large Surface Orientation Misalignment

Sudheera Akalanka Kariyawasam, Mahmud Hasan Saikot, Bo Cheng, Jianguo Zhao

无人机 / 空中机器人操作与机械臂
摘要

—Perching has recently emerged as a key capability in aerial robotics, enabling substantial gains in energy efficiency and operational endurance for long-duration missions. In this work, we present a multifunctional perching mechanism capable of both branch and surface perching. For branch perching, the mechanism uses four simultaneously-actuated claws to grasp branches with diameters up to 35 mm. For surface perching, it leverages a bistable mechanism with five suction cups, allowing perching onto flat smooth surfaces with large orientation misalignment (with surface tilt angles of up to $\mathbf{80}^\circ$ ). The mechanism is fabricated using a combination of multimaterial 3D printing and silicone molding, resulting in a lightweight prototype with a mass of 32g. Experimental results show that the proposed design achieves $\boldsymbol{6.4\times }$ adhesion success compared to a bistable mechanism with a single suction cup. Furthermore, real-world perching experiments onto cars and tree branches demonstrate the effectiveness of the mechanism in different environments, highlighting its potential for energy-efficient and adaptable aerial robotic operations.

RA-L 2026-04-27

DoViT-Swarm: A Lightweight Dynamic Optimal Virtual Tube Planner for Swarm Aerial Robotics in Cluttered Environments

Pengxiao Wang, Haibin Duan

无人机 / 空中机器人多机器人 / 集群
摘要

Trajectory planning for swarm aerial robotics in cluttered and dynamic environments is significantly challenging. Most existing centralized planner for swarm robots either assume a static environment or impose prohibitive computational overhead. In this letter, a lightweight d ynamic o ptimal vi rtual t ube planner, DoViT-Swarm, is proposed. This method enables the generation of infinite optimal trajectories within the dynamic virtual tube while maintaining computational complexity independent of the number of agents in the swarm. To address multiple dynamic obstacles with distinct motions, an event-triggered real-time trajectory re-planning strategy with motion estimation of obstacles is proposed to enable obstacle avoidance while preserving the optimality and homotopy of trajectories. The dynamic optimal virtual tube is then constructed for reliable and safe motion of swarm robots. Both simulation and real-world experiments were conducted to validate the effectiveness of the proposed method.

JFR 2026-05-12

Traversability Risk Assessment and Path Planning for Off‐Road Autonomous Vehicles in Winter Conditions

Nan Wang, Xiang Li, Shilong Xu, Youkang Zhang, Jixin Wang, Dongxuan Xie

导航 / SLAM / 自动驾驶感知与传感控制与动力学
摘要

While off‐road autonomous vehicles have achieved substantial deployment success, winter conditions introduce unprecedented operational challenges, potentially leading to critical failure modes such as slipping, rollover, collision, and sinking. This paper presents an integrated technical solution to address these winter navigation challenges. First, a vehicle speed prediction model is developed by fitting probability distributions and training a Multilayer Perception to forecast key parameters of the probability density function. For traversability risk assessment, mechanical principles are first leveraged to analyze the inherent correlations among hazard events, terrain characteristics, and vehicle dynamics. A hybrid framework integrating a Variational Bayesian Network with a Long Short‐Term Memory network (VBN‐LSTM) is then constructed to predict the probabilities of hazard occurrences by jointly leveraging causal structural priors and temporal dynamics. Building on this, a joint probability model for hazard events and vehicle speed is established, explicitly accounting for their interdependencies to enable comprehensive traversability risk evaluation. Finally, the Hybrid A* algorithm is enhanced by integrating speed distribution prediction and traversability risk assessment, facilitating safer and more reliable navigation in winter off‐road environments. The improved algorithm is validated in real‐world winter terrains through comparisons with other algorithms. Experimental results demonstrate that the planned paths generated by the proposed approach outperform competitors in terms of estimated risk, path efficiency, and travel time.

T-RO 2026-03-26

Interagent Beliefs for Learning to Communicate in Large-Scale Multirobot Visual Object Search

Jernej Puc, Gašper Škulj, Jan Pleterski, Primož Podržaj, Rok Vrabič

导航 / SLAM / 自动驾驶机器人学习多机器人 / 集群
摘要

We present Century Maze ,1 a simulated environment for cooperative navigation of groups numbering over 100 robots with primarily visual observations. The underlying task of large-scale object search is designed to push the scalability and efficiency of decentralized multi-robot systems relying on communication to overcome the limitations of separate units. To enable the training of robotic agents in large-scale visual embodied applications such as Century Maze , we propose two reinforcement learning (RL) components that facilitate the emergence of effective communication: Differentiable Inter-Agent Belief Learning (DIABL) , which provides a clear learning signal by optimizing the agents' beliefs about their goals and propagating gradients back over a differentiable communication channel to include the appropriate observers, and Informative Event Replay (InfER) , which compensates for the scarcity of samples relevant to DIABL by maintaining a dedicated buffer for auxiliary learning. We demonstrate our combined method through an actor-critic neural network architecture encompassing a decentralized multi-agent system of variable size with individual goal-based belief modules. Results on Century Maze show that our method is able to significantly improve performance through cooperation between over 100 agents against comparable baselines and maintains its effectiveness in realistic conditions.

RA-L 2026-04-20

Certified Coil Geometry Learning for Short-Range Magnetic Actuation and Spacecraft Docking Application

Yuta Takahashi, Hayate Tajima, Shin-ichiro Sakai

机器人学习医疗 / 软体 / 微纳多机器人 / 集群
摘要

This paper presents a learning-based framework for approximating an exact magnetic-field interaction model, supported by both numerical and experimental validation. High-fidelity magnetic-field interaction modeling is essential for achieving exceptional accuracy and responsiveness across a wide range of fields, including transportation, energy systems, medicine, biomedical robotics, and aerospace robotics. In aerospace engineering, magnetic actuation has been investigated as a fuel-free solution for multi-satellite attitude and formation control. Although the exact magnetic field can be computed from the Biot–Savart law, the associated computational cost is prohibitive, and prior studies have therefore relied on dipole approximations to improve efficiency. However, these approximations lose accuracy during proximity operations, leading to unstable behavior and even collisions. To address this limitation, we develop a learning-based approximation framework that faithfully reproduces the exact field while dramatically reducing computational cost. This framework directly derives a coefficient matrix that maps inter-satellite current vectors to the resulting forces and torques, enabling efficient computation of control current commands. The proposed method additionally provides a certified error bound, derived from the number of training samples, ensuring reliable prediction accuracy. The learned model can also accommodate interactions between coils of different sizes through appropriate geometric transformations, without retraining. To verify the effectiveness of the proposed framework under challenging conditions, a spacecraft docking scenario is examined through both numerical simulations and experimental validation.

RA-L 2026-04-20

Vision-Based End-to-End Learning for UAV Traversal of Irregular Gaps via Differentiable Simulation

Linzuo Zhang, Yu Hu, Feng Yu, Yang Deng, Wenxian Yu, Danping Zou

无人机 / 空中机器人导航 / SLAM / 自动驾驶感知与传感
摘要

Navigation through narrow and irregular gaps is an essential skill in autonomous drones for applications such as inspection, search-and-rescue, and disaster response. However, traditional planning and control methods rely on explicit gap extraction and measurement, while recent end-to-end approaches often assume regularly shaped gaps, leading to poor generalization and limited practicality. In this work, we present a fully vision-based, end-to-end framework that maps depth images directly to control commands, enabling drones to traverse complex gaps within unseen environments. Operating in the Special Euclidean group $SE(3)$ , where position and orientation are tightly coupled, the framework leverages differentiable simulation, a Stop-Gradient operator, and a Bimodal Initialization Distribution to achieve stable traversal through consecutive gaps. Two auxiliary prediction modules—a gap-crossing success classifier and a traversability predictor—further enhance continuous navigation and safety. Extensive simulation and real-world experiments demonstrate the approach's effectiveness, generalization capability, and practical robustness.

RA-L 2026-04-20

SGFAM: Semantic and Geometric Features Aggregation for Dense Shape Matching in Generalizable Robotic Manipulation

Paolo Sebeto, Christian Hartl-Nesic, Jean-Baptiste Weibel, Daniel Zimmer, Andreas Holzinger, Markus Vincze

操作与机械臂机器人学习感知与传感
摘要

To automate processes like polishing or cleaning at scale, robots must be able to adapt learned skills to new object instances without manual reprogramming. Applications requiring tool-surface interactions face a significant challenge in transferring manipulation strategies to novel objects due to substantial shape and appearance variations. Robust, generalized dense shape correspondence is essential for solving this problem. We present SGFAM, a zero-shot pipeline integrating pre-trained vision foundation models and geometric encoders via functional maps. Unlike prior works that rely on simple averaging, we introduce 1) an alignment-based feature aggregation to prioritize optimal viewing angles, and 2) Kernel PCA fusion to preserve non-linear descriptor manifolds. Our evaluations demonstrate that this approach not only outperforms state-of-the-art baselines but also enables lightweight vision backbones to achieve matching precision comparable to larger models. We validate SGFAM experimentally by successfully transferring continuous surface paths in real-world industrial and household robotic scenarios without requiring any finetuning.

RA-L 2026-04-20

A Data-Observation Hybrid Compensation Method for Precise Force Control of Cable-Driven Wrist Exoskeletons in Teleoperation

Fengyi Shi, Jiawei Tan, Dong Wang, Xingfang Wang, Hui Li, Xiao Huang, et al.

医疗 / 软体 / 微纳人机交互 / 遥操作控制与动力学
摘要

High-precision force control of wearable exoskeletons enables highly transparent force interaction operations, effectively improving the feasibility of teleoperation tasks. We adopted a cable-driven spherical parallel wrist exoskeleton (SPWE) which enables it to reduce the volume and enhance operational flexibility; however, significant hysteresis effects and nonlinear friction lead to insufficient force control accuracy. This paper proposes a Data-Observation Hybrid Compensation (DOHC) method, which constructs a Gaussian process-based hysteresis-aware model using physical characteristics. With critical velocity as the judgment criterion, a friction observer is introduced under quasi-static conditions to achieve high-precision compensation across both quasi-static and motion states. Experimental results demonstrate that DOHC significantly improves friction modeling accuracy and joint torque tracking performance in SPWE, and exhibits effective force transparency in haptic rendering experiments, providing reliable technical support for immersive teleoperation tasks.

RA-L 2026-04-20

LLM-Diffu: Robot Dexterous Grasp Generation Network With Diffusion Model and LLM

Zhongli Wang, Pingyue Zhang, Shijie Guo, Haihang Wang, Yuanmin Dong

操作与机械臂机器人学习感知与传感
摘要

Generating accurate dexterous grasps for objects with complex geometries remains a critical challenge in robotic manipulation. The paper proposes LLM-Diffu, a novel network architecture built on the principles of diffusion models. This architecture integrates a specialized basis point set (BPS) point cloud encoder, a denoising network, and large language models (LLMs) to generate feasible, diverse, and high-quality dexterous grasps. For the dexterous hand grasp dataset, the paper proposes a method for constructing the dataset, which incorporates a multimodal large language model (MLLM) to enhance the grasp quality of the dataset and is applicable to the construction of various types of datasets. Simulated experiments show our LLM-Diffu outperforms state-of-the-art methods, achieving 79.6% and 83.2% grasp success rates on DexGraspNet and MultiDex datasets, respectively. Our constructed dataset has strong generalizability, supporting training of various dexterous grasp generation methods with promising results. Finally, real-world robotic experiments confirm the practical applicability of our constructed dataset.

RA-L 2026-04-20

Safety-Aware Imitation Learning via MPC-Guided Disturbance Injection

Le Qiu, Yusuf Umut Ciftci, Somil Bansal

无人机 / 空中机器人足式 / 四足机器人导航 / SLAM / 自动驾驶机器人学习控制与动力学
摘要

Imitation Learning has provided a promising approach to learning complex robot behaviors from expert demonstrations. However, learned policies can make errors that lead to safety violations, which limits their deployment in safety-critical applications. We propose MPC-SafeGIL, a design-time approach that enhances the safety of imitation learning by injecting adversarial disturbances during expert demonstrations. This exposes the expert to a broader range of safety-critical scenarios and allows the imitation policy to learn robust recovery behaviors. Our method uses sampling-based Model Predictive Control (MPC) to approximate worst-case disturbances, making it scalable to high-dimensional and black-box dynamical systems. In contrast to prior work that relies on analytical models or interactive experts, MPC-SafeGIL integrates safety considerations directly into data collection. We validate our approach through extensive simulations including quadruped locomotion and visuomotor navigation and real-world experiments on a quadrotor, demonstrating improvements in both safety and task performance. Project Website: https://leqiu2003.github.io/MPCSafeGIL/

RA-L 2026-04-20

1D2L: One-Arm Drag and Two-Arm Lift for Manipulating Large and Heavy Tabletop Objects

Meijie Zhou, Jianjun Yuan, Zhengtao Hu, Sheng Bao, Liang Du

操作与机械臂导航 / SLAM / 自动驾驶多机器人 / 集群控制与动力学
摘要

Dual-arm robots are widely used to cooperatively manipulate large, heavy objects that exceed single-arm payload limits. However, conventional dual-arm planners typically assume that the object is already positioned within the reachable and graspable workspace of both end-effectors. When the object lies outside this workspace or in an awkward orientation, such planners often fail or require additional hardware (e.g. a mobile base or overhead cranes) to reposition the robot or the object first. To address this limitation, we present 1D2L (One-Arm Drag and Two-Arm Lift), a task-and-motion planning framework for dual-arm tabletop manipulation. The framework extends manipulation capability by first dragging the object with one arm into a “liftable zone” on a horizontal planar surface, and then performing a cooperative two-arm lift. The liftable zone contains pre-computed set of tabletop poses derived from kinematic reachability, grasp feasibility, and dynamic stability requirements for the subsequent dual-arm lift. Beyond lifting, 1D2L provides as a general paradigm for large-object tabletop manipulation, where one-arm dragging for pre-positioning to enable subsequent two-arm tasks. We validate the 1D2L framework experimentally, demonstrating a successful framework that enables robots to handle large or heavy objects that are initially unreachable.

RA-L 2026-04-20

Multi-Robot Motion Planning From Vision and Language Using Heat-Inspired Diffusion

Jebeom Chae, Junwoo Chang, Seungho Yeom, Yujin Kim, Jongeun Choi

导航 / SLAM / 自动驾驶感知与传感多机器人 / 集群
摘要

Diffusion models have recently emerged as powerful tools for robot motion planning by capturing the multi-modal distribution of feasible trajectories. However, their extension to multi-robot settings with flexible, language-conditioned task specifications remains limited. Furthermore, current diffusion-based approaches incur high computational cost during inference and struggle with generalization because they require explicit construction of environment representations and lack mechanisms for reasoning about geometric reachability. To address these limitations, we present Language-conditioned Heat-inspired Diffusion (LHD), an end-to-end vision-based framework that generates language-conditioned, collision-free trajectories. LHD integrates semantic priors from CLIP, a vision-language model (VLM), with a collision-avoiding diffusion kernel serving as a physical inductive bias that enables the planner to interpret language commands strictly within the reachable workspace. This naturally handles out-of-distribution (OOD) scenarios—in terms of reachability—by guiding robots toward accessible alternatives that match the semantic intent, while eliminating the need for explicit obstacle information at inference time. Extensive evaluations on diverse real-world-inspired maps, along with real-robot experiments, show that LHD consistently outperforms prior diffusion-based planners in success rate, while reducing planning latency. Project page is available at: https://jebeom.github.io/lhd_project_page/

RA-L 2026-04-20

DPNet: Doppler LiDAR Motion Planning for Highly-Dynamic Environments

Wei Zuo, Zeyi Ren, Chengyang Li, Yikun Wang, Mingle Zhao, Shuai Wang, et al.

导航 / SLAM / 自动驾驶机器人学习感知与传感控制与动力学
摘要

Existing motion planning methods often struggle with rapid-motion obstacles due to an insufficient understanding of environmental changes. To address this, we propose integrating motion planners with Doppler LiDARs, which provide not only ranging measurements but also instantaneous point velocities. However, this integration is nontrivial due to the requirements of high accuracy and high frequency. To this end, we introduce Doppler Planning Network (DPNet), which tracks and reacts to rapid obstacles via Doppler model-based learning. We first propose a Doppler Kalman neural network (D-KalmanNet) to track obstacle states under a partially observable Gaussian state space model. We then leverage the predicted motions of obstacles to construct a Doppler-tuned model predictive control (DT-MPC) framework for ego-motion planning, enabling runtime auto-tuning of controller parameters. These two modules allow DPNet to learn fast environmental changes from minimal data while remaining lightweight, achieving high frequency and high accuracy in both tracking and planning. Experiments on high-fidelity simulator and real-world datasets demonstrate the superiority of DPNet over extensive benchmark schemes.

RA-L 2026-04-20

Kinematics-Aware Diffusion Policy With Consistent 3D Observation and Action Space for Whole-Arm Robotic Manipulation

Kangchen Lv, Mingrui Yu, Yongyi Jia, Chenyu Zhang, Xiang Li

操作与机械臂机器人学习感知与传感
摘要

Full-configuration control of robotic manipulators with awareness of whole-arm kinematics is crucial for many manipulation scenarios involving body collision avoidance or body-object interactions, making it insufficient to consider only the end-effector poses in policy learning. The typical approach for whole-arm manipulation is to learn actions in the robot's joint space. However, the misalignment between the joint space and the actual task space (i.e., 3D space) increases the complexity of policy learning, as generalization in task space requires the policy to intrinsically understand the non-linear arm kinematics, which is difficult to learn from limited demonstrations. To address this issue, this letter proposes a kinematics-aware imitation learning framework with consistent task, observation, and action spaces, all represented in the same 3D space. Specifically, we represent both robot states and actions using a set of 3D points on the arm body, naturally aligned with the 3D point cloud observations. This spatially consistent representation improves the policy's sample efficiency and spatial generalizability while enabling whole-arm control. Built upon the diffusion policy, we further incorporate kinematics priors into the diffusion processes to guarantee the kinematic feasibility of output actions. The joint angle commands are finally calculated through an optimization-based whole-arm inverse kinematics solver for execution. Simulation and real-world experimental results demonstrate higher success rates and stronger spatial generalizability of our approach compared to existing methods in arm-aware manipulation policy learning. Supplementary materials are available at our Project Website.

RA-L 2026-04-20

A Multi-Camera Coordinated Localization Approach for Robust State Estimation of Flying Robots

Zijia He, Sijia Chen, Yuan Gao, Wei Dong

无人机 / 空中机器人导航 / SLAM / 自动驾驶感知与传感
摘要

Ground-air-based visual localization for UAVs, which typically relies on a single camera to track artificial markers, is often hampered by a limited field of view and sensitivity to illumination. Although active camera scheduling can mitigate these issues to some extent, it often incurs high motion costs and intermittent tracking losses during large-scale or agile UAV maneuvers. This paper presents a coordinated ground-air localization approach based on multi-camera scheduling to achieve continuous and full-coverage tracking. We equip UAVs with active infrared markers arranged in a distinctive structure, enabling robust 6D relative state estimation through a coarse-to-fine feature extraction and geometric matching. Building on this, we propose a predictive sparse scheduling framework that jointly optimizes camera activation and control over a receding horizon, explicitly balancing observation quality with actuation effort. Experimental results show our method achieves high-precision, robust localization under strong illumination interference. Compared to a fixed-camera baseline, the approach reduces the target loss rate by 29.74%, and decreases camera rotations by 70.32% compared to a basic active-vision method, demonstrating both efficient and reliable performance.

IJRR 2026-03-28

Cooperative task spaces for multi-arm manipulation control based on similarity transformations

Tobias Löw, Cem Bilaloglu, Sylvain Calinon

人形机器人操作与机械臂多机器人 / 集群人机交互 / 遥操作控制与动力学
摘要

Many tasks in human environments require collaborative behavior between multiple kinematic chains, either to provide additional support for carrying big and bulky objects or to enable the dexterity that is required for in-hand manipulation. Since these complex systems often have a very high number of degrees of freedom, coordinating their movements is notoriously difficult to model. In this article, we present the derivation of the theoretical foundations for cooperative task spaces of multi-arm robotic systems based on geometric primitives defined using conformal geometric algebra. Based on the similarity transformations of these cooperative geometric primitives, we derive an abstraction of complex robotic systems that enables representing these systems in a way that directly corresponds to single-arm systems. By deriving the associated analytic and geometric Jacobian matrices, we then show the straightforward integration of our approach into classical control techniques rooted in operational space control. We demonstrate this using bimanual manipulators, humanoids and multi-fingered hands in optimal control experiments for reaching desired geometric primitives and in teleoperation experiments using differential kinematics control. We then discuss how the geometric primitives naturally embed nullspace structures into the controllers that can be exploited for introducing secondary control objectives. This work represents the theoretical foundations of this cooperative manipulation control framework, and thus the experiments are presented in an abstract way, while giving pointers toward potential future applications.

T-RO 2026-03-24

MoCom: Motion-based Inter-MAV Visual Communication Using Event Vision and Spiking Neural Networks

Zhang Nengbo, Hann Woei Ho, Ye Zhou

无人机 / 空中机器人机器人学习感知与传感多机器人 / 集群
摘要

Reliable communication in Micro Air Vehicle (MAV) swarms is challenging in environments, where conventional radio-based methods suffer from spectrum congestion, jamming, and high power consumption. Inspired by the waggle dance of honeybees, which efficiently communicate the location of food sources without sound or contact, we propose a novel visual communication framework for MAV swarms using motion-based signaling. In this framework, MAVs convey information, such as heading and distance, through deliberate flight patterns, which are passively captured by event cameras and interpreted using a predefined visual codebook of four motion primitives: vertical (up/down), horizontal (left/right), left-to-up-to-right, and left-to-down-to-right, representing control symbols (“start”, “end”, “1”, “0”). To decode these signals, we design an event frame-based segmentation model and a lightweight Spiking Neural Network (SNN) for action recognition. An integrated decoding algorithm then combines segmentation and classification to robustly interpret MAV motion sequences. Experimental results validate the framework's effectiveness, which demonstrates accurate decoding and low power consumption, and highlights its potential as an energy-efficient alternative for MAV communication in constrained environments.

RA-L 2026-04-24

PGCSPose: Physics-Constrained Generation and Causal Semantic Fusion for Robust In-Hand Pose Estimation

Peiliang Wu, Yao Li, Mingyue Niu, Wenbai Chen, Guowei Gao

操作与机械臂感知与传感
摘要

Accurate in-hand pose estimation is essential for dexterous robotic manipulation but remains fragile under severe visual occlusion ( $>$ 50%) and intermittent tactile contact. Existing visuo-tactile fusion methods treat vision and touch symmetrically, overlooking the asymmetric dependency: visual observations (object geometry and grasp) influence tactile feedback through contact mechanics. We present PGCSPose , a framework that leverages this vision $\rightarrow$ tactile dependency via two components: TacDiffusion , a physics-constrained diffusion model that synthesizes plausible tactile signals under contact loss, and CauCon , a reasoning module inspired by causal principles that integrates semantic priors from language models for robust inference. Extensive evaluation on ObjectInHand and VinT-6D demonstrates that PGCSPose reduces pose error by $>$ 50% under 70% occlusion and 60% tactile dropout, achieving position errors of 0.44–0.47 cm and angular errors of 0.075–0.081 rad across benchmarks, while maintaining real-time performance (16.1 FPS). These results demonstrate that physics-constrained generation and causally-inspired semantic reasoning enable reliable manipulation under degraded sensing conditions.

IJRR 2026-03-27

Robotic perception and manipulation of deformable linear objects: A survey

Alessio Caporali, Ignacio Cuiral-Zueco, Gonzalo López-Nicolás, Gianluca Palli

无人机 / 空中机器人操作与机械臂感知与传感医疗 / 软体 / 微纳
摘要

Deformable linear objects (DLOs) are widely encountered in everyday life, taking forms such as plastic tubes, wires, ropes, and cables. They are prevalent across diverse settings, including industrial, domestic, and medical environments, as well as in outdoor applications like electric power lines, subaquatic cables, and aerial transport systems. These objects are termed deformable due to their ability to undergo significant shape changes under external forces, and linear because their length vastly exceeds their cross-sectional dimensions. Despite their importance and widespread presence, developing robotic systems capable of interacting with DLOs poses numerous challenges. This survey presents a comprehensive review of the state-of-the-art methods developed over the past decade to address these challenges. It covers key areas including physical and data-driven modeling techniques, simulation environments, perception approaches based on vision and tactile sensing, as well as strategies for estimation, planning, and control. It also reviews common manipulation tasks such as grasping, shaping, routing, knotting, suturing, and transport. The survey concludes with a critical discussion of current limitations and outlines promising directions for future research.

RA-L 2026-04-22

From Tool to Teammate: Balancing Task-Oriented and Social Talk During Human–Robot Collaboration

Chayan Sarkar, Rebecca Ramnauth, Kate Candon, Alexander Lew, Marynel Vázquez

感知与传感人机交互 / 遥操作
摘要

As robots become more integrated into shared work environments, effective communication must balance clear task directives with social engagement to support both efficiency and user satisfaction. This letter examines how people perceive task-oriented versus social dialogue during human–robot collaboration and how they judge the appropriateness of switching between them. We discuss a pilot, in-person data collection and two follow-up online studies with an autonomous robot in a collaborative pizza-making task. The pilot assesses the technical feasibility of an autonomous robot generating distinct task and social dialogue styles. Next, with a larger participant sample, we evaluate the salience of the two talk types and their impact on perceptions of the robot as a collaborative partner. The second study explores judgments of talk type switching using video clips from the pilot, revealing how timing, task state, and user behavior influence perceived appropriateness. Together, these findings highlight nuanced human expectations for robot communication and inform the design of collaborative robots that can adaptively balance task-oriented and social talk in contextually sensitive ways.

RA-L 2026-04-22

Single-Actuator Gripper Using an Antagonistic Cable-Driven Differential Mechanism for Adaptive Fixed-Position Grasping

Xiao Jiang, Jiahao Shen, Diwei Huang, Xiang Cheng, Chongkun Xia, Bin Liang

操作与机械臂控制与动力学
摘要

Fixed-position grasping is an efficient strategy in robotic assembly. Adaptive grippers employing differential mechanisms (DMs) can passively compensate for misalignment between gripper and object. However, conventional DMs, such as gear-based differential mechanisms (GDMs), are limited by rotational inertia and friction, generating excessive contact forces that compromise the object's positional stability and ultimately degrade assembly accuracy. To address this issue, a single-actuator gripper based on an antagonistic cable-driven differential mechanism (CDDM) is proposed, in which differential cable tensions are converted into low-friction pulley rotations to enable low-contact-force adaptive grasping. The kinematic relationship between the CDDM and the gripper stroke is modeled, and a kinematics-based optimization method is developed to minimize the gripper's dimensions and achieve the prescribed stroke without converging to local optima. Furthermore, the dynamics of the CDDM-based gripper are established to evaluate the contact forces on the object. Finally, an adaptive CDDM-based gripper is built to validate its payload capacity, repeatability, and adaptive performance in fixed-position grasping. Experimental results show that the proposed gripper outperforms a gear-based design, reducing the contact force by 53.28%, a repeatability displacement accuracy of less than 0.30 mm during the long-term operation, and limiting error to 0.28 mm for picks with a 4.02 mm positional offset.

RA-L 2026-04-22

Spike-EVPR: Deep Spiking Residual Networks With SNN-Tailored Representations for Event-Based Visual Place Recognition

Zuntao Liu, Yaohui Li, Chenming Hu, Delei Kong, Junjie Jiang, Zheng Fang

机器人学习感知与传感
摘要

Event cameras are ideal for visual place recognition (VPR) in challenging environments due to their high temporal resolution and high dynamic range. However, existing methods convert sparse events into dense frame-like representations for Artificial Neural Networks (ANNs), ignoring event sparsity and incurring high computational cost. Spiking Neural Networks (SNNs) complement event data through discrete spike signals to enable energy-efficient VPR, but their application is hindered by the lack of effective spike-compatible representations and deep architectures for learning discriminative global descriptors. To address these limitations, we propose Spike-EVPR, a directly trained, end-to-end SNN framework tailored for event-based VPR. First, we introduce two complementary event representations, MCS-Tensor and TSS-Tensor, designed to reduce temporal redundancy while preserving essential spatio-temporal cues. Furthermore, we propose a deep spiking residual architecture that aggregates these features to generate robust place descriptors. Extensive experiments on the Brisbane-Event-VPR and DDD20 datasets demonstrate that Spike-EVPR achieves state-of-the-art performance, improving Recall@1 by 7.61% and 13.20%, respectively, while significantly reducing energy consumption.

RA-L 2026-04-22

Speed-Optimized Motion Planning for Robotic Object Handling via Constrained Linear Convex Optimization

Yousef Farid, Marco Baracca, Giorgio Simonini, Paolo Salaris

操作与机械臂导航 / SLAM / 自动驾驶
摘要

This paper presents a framework for speed-optimized motion planning in robotic object handling, including dynamic throwing and pick-and-place with obstacle avoidance tasks. An S-curve-based optimization algorithm first computes optimal throwing configurations to ensure accurate object landing. A two-stage trajectory generation strategy is adopted: (1) speed-optimized trajectory planning to compute time-efficient velocity profiles under physical constraints, and (2) B-spline-based trajectory smoothing to produce smooth joint trajectories. Velocity, acceleration, torque, and jerk constraints are reformulated as convex problems, with cubic B-splines representing the squared pseudo-velocity and a convex relaxation addressing non-convex jerk limits. For obstacle-aware motions, a discrete LP framework parameterizes the path with cubic polynomials and enforces convex polyhedral constraints on the end-effector to maintain safety. The LP maximizes traversal speed while satisfying joint and dynamic limits. Simulation and real-world tests on a Franka Emika Panda robot with a qb SoftHand as a gripper show precise, fast object handling, including throwing and PnP operations.

RA-L 2026-04-22

Toward a Multi-Embodied Grasping Agent

Roman Freiberg, Alexander Qualmann, Ngo Anh Vien, Gerhard Neumann

人形机器人操作与机械臂
摘要

Multi-embodiment grasping aims to develop approaches that exhibit generalist behavior across diverse gripper designs. Existing methods often learn the gripper kinematic structure implicitly and face challenges due to the difficulty of sourcing the required large-scale data. In this work, we present a data-efficient, flow-based, equivariant grasp synthesis architecture that handles different gripper types with variable degrees of freedom and exploits the underlying kinematic model, deducing all necessary information solely from gripper and scene geometry. Unlike previous equivariant grasping methods, we implement all modules in JAX and provide batching capabilities over scenes, grippers, and grasps, resulting in smoother learning, improved performance, and faster inference. Our dataset encompasses grippers ranging from humanoid hands to parallel-jaw designs, including 25,000 scenes and 20 million grasps.

Sci. Robotics 2026-03-25

Driver’s licenses for autonomous systems

Sebastian M. Pfotenhauer, Alexander Wentland, Manuel Jung, Markus Lienkamp, Dava Newman

摘要

Familiar licensing routines, like driving exams, may beat technical checklists in building trust in autonomous systems.

JFR 2026-05-25

A Depth Control Method for Full Ocean Depth AUV

Yueming Li, Enrui Sui, Jian Cao, Ye Li, Yanqing Jiang, Bo Wang, et al.

摘要

This paper presents an integrated vertical control method for online identification and compensation of residual buoyancy, designed for full‐ocean‐depth (FOD) autonomous underwater vehicles (AUVs) that rely on vertical thrusters for depth‐keeping or height‐keeping hovering tasks. To address the residual buoyancy drift caused by vehicle hull compression and seawater density variations under extreme hydrostatic pressure, the proposed method constructs a condition‐triggered bidirectional switching framework between velocity and position control modes: During the large‐error stage, a velocity control mode is employed to rapidly drive the vehicle into a low‐speed region. Once the depth error and vertical velocity satisfy predefined threshold conditions, the system switches to a position control mode. In this mode, the equivalent residual buoyancy is estimated online via thruster force equilibrium and applied as feedforward compensation to achieve precise and stable hovering. If operating conditions change, the switching and identification mechanisms can be retriggered to maintain control performance. The proposed method was validated through test‐tank experiments and sea trials for the 1000‐, 7000‐, and 11,000‐m classes using the Wukong FOD AUV. Field results demonstrated that the system achieved average height tracking errors of 0.0124 and 0.0668 m in the test tank and 11,000‐m class trials, respectively. These results verify the effectiveness and engineering applicability of the proposed method in rejecting unknown residual buoyancy disturbances and achieving stable hovering in hadal environments.

JFR 2026-05-13

Design and Kinematic Analysis of a Six‐Wheeled Robot With a Passive Suspension for Integrated Terrain Adaptability and Vibration Mitigation

Xiaoliang Zhang, Longjin Liang, Pingyi Liu, Ying Chen, Hengda Li, Liang Sun

导航 / SLAM / 自动驾驶控制与动力学
摘要

This paper presents a six‐wheeled mobile robot equipped with a novel passive adaptive suspension system. By integrating spring dampers into the bogies, the system yields two distinct configurations: a tandem lateral swing suspension and a parallel longitudinal swing suspension. Both designs allow the bogies to pivot relative to the robot body in response to low‐frequency terrain undulations, while the integrated spring dampers effectively absorb high‐frequency excitations from the ground. This mechanism ensures continuous wheel‐terrain contact on complex terrain while effectively reducing vibrations at high speeds. To assess motion smoothness and posture stability, the influence of spring‐damper deformation on the robot's attitude was first quantified. Subsequently, kinematic models of the centroid and spatial posture were established. These models determine the maximum centroid region radius and condition‐specific inverse solutions, and their validity was confirmed through multi‐body simulations, demonstrating high predictive accuracy. Field experiments show that the novel adaptive suspension reduces vertical chassis acceleration by approximately 25% compared with a rigid suspension. The integration of three adaptive suspension units significantly enhances posture stability under extreme terrain conditions, improves step‐climbing performance, and enables a payload‐to‐weight ratio of 1.41, which exceeds that of most existing wheeled platforms. Overall, the design resolves long‐standing trade‐offs among terrainability, vibration attenuation, and payload capacity, making it well‐suited for demanding tasks such as hilly farmland operations, disaster relief, and resource exploration.

RA-L 2026-04-27

Design and Modeling of Motorized Tapping Mechanism for Effective Needle Penetration

Jia Shen, Yilin Cai, Ehud Schmidt, Robert Cormack, Junichi Tokuda, Yue Chen

操作与机械臂
摘要

Robot-assisted needle insertion is widely employed in minimally invasive percutaneous interventions such as biopsy, brachytherapy, or ablation; however, most existing systems predominantly emphasize needle dexterity and manipulation within the soft tissue, neglecting the significant challenge of percutaneous access. In practice, conventional needle insertions often require excessive force that causes significant deformation of both tissue and needle, particularly when penetrating stiff skin or fibrotic tumors. To address this, we develop a novel tapping mechanism that generates instantaneous momentum at the needle tip to achieve efficient tissue penetration. Theoretical models are formulated to describe the force–deformation behavior and fracture mechanics induced by the tapping motion. Experiments on porcine tissue demonstrated that the tapping motion reduced the mean rupture force and tissue deformation from 9.75 N and 34.44 mm obtained via the conventional continuous insertion to 2.79 N and 19.85 mm, respectively. Furthermore, the tapping mechanism enabled successful insertion into the two-barrier porcine tissue using compliant needles that would otherwise buckle under continuous insertion, increasing the second-barrier successful penetration rate from 0/15 to 13/15.

JFR 2026-05-18

Improved ESO‐LOS Guidance Strategy for AUV: Theory and Experiment Validation

Zijian Zhu, Peizhou Du, Guocheng Zhang, Ye Li

控制与动力学
摘要

To address autonomous underwater vehicle (AUV) path tracking under time‐varying disturbance, this paper proposes an Extended State Observer (ESO)‐based Line‐of‐Sight (LOS) guidance strategy that accounts for time‐varying ocean currents, model uncertainties, and drift angles. The disturbances are treated as a lumped term without assumptions on their magnitude or variation. A kinematic error model incorporating these effects is established, and an ESO is designed for real‐time disturbance estimation and compensation. An adaptive look‐ahead distance is introduced to enhance responsiveness to the AUV's tracking state. A virtual control input is then integrated into the traditional LOS framework to formulate an improved guidance law, forming a closed‐loop cascaded system of tracking and estimation errors. Using input‐to‐state stability (ISS) theory and the cascaded systems theorem, the overall system is proven to be ISS, and the ultimate bound of the cross‐track error is derived under bounded disturbances. The proposed method was compared with several existing approaches in simulations, demonstrating its superiority. Finally, three field experiments were conducted to track straight‐line, polyline, and circular paths, respectively, achieving a mean cross‐track error of no more than 0.48 m across three tested path types and an average estimation error of ESO no greater than 0.0163 m.

JFR 2026-05-18

Modeling, Identification, and Validation of a Vector Propelled Amphibious Vehicle

Ye Wang, Sihuan Feng, Lingbo Zhang, Jiaqi Chen, Huilong Yu, Junqiang Xi

控制与动力学
摘要

High‐fidelity models play an essential role in advancing the structural optimization and motion simulation of amphibious vehicles. However, the complexity of hydrodynamics poses significant challenges in dynamic modeling, parameter identification, and experimental validation. To address these challenges, this research derives a six‐degree‐of‐freedom dynamic model for a vector propelled amphibious vehicle based on maneuvering theory, including a dedicated propulsion system dynamic model. Given the system identification challenges posed by the highly coupled multi‐parameter dynamics, a systematic experimental framework is devised, featuring decoupled measurements of the propulsion and maneuvering dynamics. A staged parameter identification methodology integrating the genetic algorithm and the least squares method is proposed. The methodology initially identifies a subset of parameters through decoupled reduced‐order models, and subsequently performs a systematic identification of the remaining parameters based on the complete coupled model. For model validation, a simulation platform based on numerical integration methods is developed, with real‐time visualization implemented in Unreal Engine 4 (UE4). Field tests and hardware‐in‐the‐loop (HIL) validation demonstrate that the established model with identified parameters can accurately capture the motion characteristics of the amphibious vehicle.

JFR 2026-05-12

A Critical Review of Reinforcement Learning Algorithms for Mobile Robot Path Planning

P. Ramya, P. Natesan, S. Venkatachalam

导航 / SLAM / 自动驾驶机器人学习
摘要

Mobile robots are increasingly deployed across industrial and service sectors, where autonomous navigation is required in both time‐invariant, static environments and time‐variant, dynamic environments. During navigation, robots must handle diverse obstacles, including fixed and moving objects, while minimizing travel distance, execution time, and collision risk. Although various machine‐learning‐based path planning approaches have been proposed to address these challenges, many depend on pre‐collected data sets, and obtaining such data in real‐time, unpredictable environments is difficult and often impractical. This review focuses on reinforcement‐learning‐based path planning, wherein mobile robots learn obstacle characteristics, path structure, and optimal policies directly from the environment through trial‐and‐error interaction, largely without relying on external training data. The study examines key challenges associated with autonomous navigation and analyzes reinforcement learning techniques in terms of their advantages, limitations, applications, performance metrics, obstacle categories, and obstacle avoidance mechanisms. A quantitative assessment of 58 selected papers reveals that 51 percent of the studies concentrate on local path planning, 32 percent on global planning, and 17 percent on hybrid approaches that integrate both planning strategies. These findings highlight a growing research shift towards data‐efficient reinforcement learning approaches for dynamic and uncertain environments, while global planners remain prevalent in static settings. The insights provided in this review support researchers and practitioners in selecting suitable reinforcement‐learning‐based path planning algorithms aligned with specific environmental conditions and navigation requirements.

JFR 2026-05-12

BM3D‐Based Optical Flow Tracking for Enhanced Visual Simultaneous Localization and Mapping Systems in Mobile Robotics

Yu Xin Qin, Wei Jie Zhou, Liang Long Chen, Yu Chen

导航 / SLAM / 自动驾驶感知与传感
摘要

In existing research on SLAM systems, the corner point detection problem of the vision front‐end is usually abstracted as a feature recognition problem. However, traditional corner point detection algorithms are too sensitive to noise and susceptible to scale variations and luminance fluctuations, failing to fully and effectively capture the image information obtained from the vision front‐end sensors. To address this challenge, this paper proposes a new vision front‐end based ORB‐SLAM3 method, hereinafter referred to as BLO‐SLAM. The following three main innovations are proposed: (1) An optimized system model incorporating the BM3D denoising strategy, which significantly enhances feature extraction efficiency and improves edge feature‐point matching in low‐texture and low‐light environments; (2) To tackle the challenge of accurately capturing local pixel motions in scenarios involving rapid or minimal motion, an improved Lucas‐Kanade (LK) optical flow tracking algorithm is proposed. This enhancement reduces feature‐point matching errors caused by camera displacement and rotation, thereby improving the robustness of the vision front‐end; and (3) The application of BLO‐SLAM to multi‐robot systems. Extensive evaluations on public datasets and self‐constructed datasets demonstrate that the proposed method effectively enhances feature matching efficiency and reduces the influence of irrelevant feature points. As a result, the proposed system improves feature matching accuracy while suppressing the influence of irrelevant feature points, leading to a more robust and reliable map representation.

T-RO 2026-03-26

Toward Deep Representation Learning for Event-Enhanced Visual Autonomous Perception: The eAP Dataset

Jinghang Li, Shichao Li, Qing Lian, Peiliang Li, Xiaozhi Chen, Yi Zhou

导航 / SLAM / 自动驾驶感知与传感
摘要

Recent visual autonomous perception systems achieve remarkable performances with deep representation learning. However, they fail in scenarios with challenging illumination. While event cameras can mitigate this problem, there is a lack of a large-scale dataset to develop event-enhanced deep visual perception models in autonomous driving scenes. To address the gap, we present the eAP ( e vent-enhanced A utonomous P erception) dataset, the largest dataset with event cameras for autonomous perception. We demonstrate how eAP can facilitate the study of different autonomous perception tasks, including 3D vehicle detection and object time-to-contact (TTC) estimation, through deep representation learning. Based on eAP , we demonstrate the first successful use of events to improve a popular 3D vehicle detection network in challenging illumination scenarios. eAP also enables a devoted study of the representation learning problem of object TTC estimation. We show how a geometry-aware representation learning framework leads to the best event-based object TTC estimation network that operates at 200 FPS. The dataset, code, and pre-trained models will be made publicly available for future research.

RA-L 2026-04-20

GaussianFlow SLAM: Monocular Gaussian Splatting SLAM Guided by GaussianFlow

Dong-Uk Seo, Jinwoo Jeon, Eungchang Mason Lee, Hyun Myung

导航 / SLAM / 自动驾驶感知与传感
摘要

Gaussian splatting has recently gained traction as a compelling map representation for SLAM systems, enabling dense and photo-realistic scene modeling. However, its application to monocular SLAM remains challenging due to the lack of reliable geometric cues from monocular input. Without geometric supervision, mapping or tracking could fall in local-minima, resulting in structural degeneracies and inaccuracies. To address this challenge, we propose GaussianFlow SLAM, a monocular 3DGS-SLAM that leverages optical flow as a geometry-aware cue to guide the optimization of both the scene structure and camera poses. By encouraging the projected motion of Gaussians, termed GaussianFlow, to align with the optical flow, our method introduces consistent structural cues to regularize both map reconstruction and pose estimation. Furthermore, we introduce normalized error-based densification and pruning modules to refine inactive and unstable Gaussians, thereby contributing to improved map quality and pose accuracy. Experiments conducted on public datasets demonstrate that our method achieves superior rendering quality and tracking accuracy compared with stateof- the-art algorithms. The source code is available at: https: //github.com/url-kaist/gaussianflow-slam.

RA-L 2026-04-20

AgriGS-SLAM: Orchard Mapping Across Seasons via Multi-View Gaussian Splatting SLAM

Mirko Usuelli, David Rapado-Rincon, Gert Kootstra, Matteo Matteucci

导航 / SLAM / 自动驾驶感知与传感
摘要

Autonomous robots in orchards require real-time 3D scene understanding despite repetitive row geometry, seasonal appearance changes, and wind-driven foliage motion. We present AgriGS-SLAM, a Visual–LiDAR SLAM framework that couples direct LiDAR odometry and loop closures with multi-camera 3D Gaussian Splatting (3DGS) rendering. Batch rasterization across complementary viewpoints recovers orchard structure under occlusions, while a unified gradient-driven map lifecycle executed between keyframes preserves fine details and bounds memory. Pose refinement is guided by a probabilistic LiDAR-based depth consistency term, back-propagated through the camera projection to tighten geometry-appearance coupling. We validate the system on a field platform in apple and pear orchards across dormancy, flowering, and harvesting, using a standardized trajectory protocol that evaluates both training-view and novel-view synthesis to reduce 3DGS overfitting in evaluation. Across seasons and sites, AgriGS-SLAM delivers sharper, more stable reconstructions and steadier trajectories than recent state-of-the-art 3DGS-based SLAM methods while maintaining real-time performance on-tractor. While demonstrated in orchard monitoring, the approach can be applied to other outdoor domains requiring robust multimodal perception.

RA-L 2026-04-20

From Pixels to Touch: Direct Tactile Servoing With Learned Photometric Normalization

Lluis Prior Sancho, Tommaso Belvedere, Marco Tognon

操作与机械臂感知与传感
摘要

Vision-Based Tactile Sensors (VBTSs) offer highresolution contact information essential for robust robotic contactrich manipulation under occlusions and variable lighting. Direct Visual Servoing (DVS) is an interesting alternative to classical Position-Based Visual Servoing (PBVS) as it operates directly on raw pixel intensities, without requiring feature extraction and simplifying the control pipeline. Applying DVS to VBTSs is a challenge because of the sensors' strong position-dependent lighting effects that violate the brightness-constancy assumption that DVS requires. In this work, we introduce a Direct Tactile Servoing (DTS) framework that adapts the principles of DVS to VBTSs. Our approach integrates a deep convolutional UNet that maps raw RGB tactile images to spatially normalized grayscale representations, restoring photometric consistency while preserving fine contact features. We further conduct targeted ablation studies to identify training strategies that balance generalization performance and data efficiency. This learned normalization enables the direct use of DVS control laws without explicit pose or force estimation. Experiments on a robotic manipulator equipped with a GelSight Mini sensor validate robust and reactive closed-loop tactile servoing across diverse contact conditions.

RA-L 2026-04-20

Harmonic Principle-Based Adjustable Constant-Force Mechanism for Stable Human-Robot Interaction in Massage Robotics

Lixing Jin, Xuanquan Wang, Xinchun Wang, Xingguang Duan, Li Zhang

操作与机械臂人机交互 / 遥操作
摘要

Stable and precise force contact is critical in human-robot interaction, typically requiring precise sensors and advanced algorithms. This paper presents a novel adjustable constant-force mechanism (CFM) serving as a force generator for massage robotic systems, aiming to reduce hardware costs and system development complexity. The passive CFM, comprising gears and springs, is further integrated with an active motor to enable online adjustment and continuous operation. As the primary innovation, the harmonic principle is employed for force regulation in the CFM, replacing traditional methods such as altering the number or assembly position of springs, which provides a more convenient integration with automated systems. This involves the superposition of two elastic systems with identical configurations, where only the phase angle is extracted as the sole control variable for force modulation. A prototype was constructed to experimentally validate its performance. Results demonstrate that the proposed solution achieves a maximum constant output force of 18.34 N over a 55 mm range, and it can be adjusted online to the appropriate level as required by the task.

RA-L 2026-04-20

PolarDepth: Polarization-Guided Monocular Depth for Visual Odometry

Naitri Rajyaguru, Tianfu Wang, Aryan Tajne, Botao He, Jiayi Wu, Cornellia Fermuller, et al.

导航 / SLAM / 自动驾驶感知与传感
摘要

Glass surfaces remain challenging for indoor robot perception. Depth sensors and RGB-only monocular depth estimation often fail because of reflections, refractions, and low-texture regions. To this end, we present PolarDepth, a polarization-enhanced monocular depth framework for glass-dominant environments. We utilize a single polarization sensor to obtain a standard RGB image and a three-channel encoding of polarization cues, designed to be compatible with RGB-trained foundation depth models, and which we call Polar-RGB. This representation enables recovery of structure in transparent and specular regions where RGB cues are unreliable. We predict depth from both the RGB and Polar-RGB representations and fuse the predictions using a learned per-pixel reliability gate for mid-level fusion, highlighting RGB in diffuse regions and polarization on reflective surfaces. We demonstrate improvements in depth estimation and visual odometry performance over RGB-only baselines in glass-walled corridors on both real-world and synthetic data.

RA-L 2026-04-20

AESF-LIO: Adaptive Error-State Fusion LiDAR-Inertial Odometry for Ground Vehicles in Structured Environments

Zhishuai Huang, Chao Sun, Jianghao Leng, Bo Wang

导航 / SLAM / 自动驾驶感知与传感
摘要

In LiDAR-based Simultaneous Localization and Mapping (SLAM) systems for vehicles, the point-to-plane Iterative Closest Point (ICP) method is widely used for scan matching. This approach incorporates all planar points into a single objective function for optimization, yet does not explicitly distinguish the differences in the constraints that ground and non-ground points impose on the pose variables. Ground points primarily constrain the z translation as well as roll and pitch, whereas non-ground points predominantly constrain horizontal translation and yaw. The constraints provided by non-ground points on the z translation, roll, and pitch depend on their spatial distribution and can be weak for z in environments dominated by vertical structures. When one point type dominates in number while providing weak constraints on certain pose variables, the joint optimization may become biased, resulting in insufficient refinement of the weakly constrained variables and degraded odometry accuracy. To address this issue within a point-to-plane ICP framework, this paper proposes a LiDAR-inertial odometry system that separately minimizes point-to-plane residuals for ground and non-ground points and adaptively fuses their corresponding error states to improve pose accuracy. The system employs ground segmentation to divide the point cloud into ground points and non-ground points; it then computes the error state vectors for ground and non-ground points separately and updates the pose using an adaptive weighted fusion strategy. Experiments in simulation, on public datasets, and in real-vehicle experiments demonstrate that the proposed method significantly improves accuracy compared with advanced SLAM baselines.

RA-L 2026-04-20

Robust Approach for LiDAR-Inertial Odometry Without Sensor-Specific Modeling

Meher V. R. Malladi, Tiziano Guadagnino, Luca Lobefaro, Cyrill Stachniss

导航 / SLAM / 自动驾驶感知与传感
摘要

Accurate odometry is a critical component in a robotic navigation stack, and subsequent modules such as planning and control often rely on an estimate of the robot's motion. LiDARs-based odometry approaches should be robust across sensor types and deployable in different target domains, from solid-state LiDARs mounted on cars in urban-driving scenarios to spinning LiDARs on handheld packages used in unstructured natural environments. In this paper, we propose a robust LiDAR system that does not rely on sensor-specific modeling. Sensor fusion techniques for LiDAR and inertial measurement unit (IMU) data typically integrate IMU data iteratively in a Kalman filter or use pre-integration in a factor graph framework, combined with LiDAR scan matching often exploiting some form of feature extraction. We propose an alternative strategy that only requires a simplified motion model for IMU integration and directly registers LiDAR scans in a scan-to-map approach. Our approach allows us to impose a novel regularization on the LiDAR registration, improving the overall odometry performance. We provide extensive experiments on different datasets covering a wide array of commonly used robotic sensors and platforms. We show that our approach works with the exact same configuration in all these scenarios, demonstrating its robustness. We have open-sourced our implementation so that the community can build further on our work and use it in their navigation stacks.

RA-L 2026-04-20

Development of a Bernoulli Suction-Type Non-Contact Manipulator Capable of Controlling Object Position and Orientation During Holding

Yuanhao Bao, Yuichi Ambe, Takeshi Takaki

操作与机械臂控制与动力学
摘要

In this study, we propose a Bernoulli suction-type non-contact manipulator that can attract an object and simultaneously change its in-plane position and orientation while holding it. The proposed manipulator consists of a suction plate equipped with six compressed-air nozzles. By ejecting compressed air from the nozzles toward the object surface, a low-pressure region is generated, which allows the object to be adsorbed on the suction plate. Furthermore, by independently adjusting the airflow rate of each nozzle, thrust forces are applied to the object surface, enabling translational motion with two degrees of freedom and rotational motion with one degree of freedom on the plane while the object is held. A high-speed camera is used to capture and calculate the position and orientation of the object, and feedback control is implemented to achieve precise control of its motion and posture. This paper describes the structure and working principle of the Bernoulli suction manipulator and experimentally verifies the feasibility of simultaneously controlling translational and rotational motion while maintaining suction.

RA-L 2026-04-20

RCVAFusion: 4D Radar and Camera Fusion With Virtual Points Association for 3D Object Detection

Jiehui Chen, Fuyuan Ai, Yuchen Tan, Xiaokang Qi, Chunyi Song, Zhiwei Xu

导航 / SLAM / 自动驾驶感知与传感
摘要

4D millimeter-wave radar plays a critical role in object detection for autonomous driving and robotics under all-weather and all-lighting conditions. Recently, the virtual-point-based approaches have attracted widespread attention due to their ability to address radar data sparsity by complementing the depth of image instance points with the nearest 3D points. However, existing radar-camera fusion methods based on virtual points simply incorporate virtual points into the raw radar points as a form of data augmentation, overlooking the potential of virtual points that have an inherent association with both radar and images. To address these issues, we present a novel radar-camera fusion network, RCVAFusion, for 3D object detection. Specifically, we first design an association branch that employs Object Area Sampling (OAS) and Virtual-Raw Points Depth Lifting (VRPDL). This branch facilitates a deep interaction between radar geometric features and image semantic features through the medium of virtual points to generate an association feature. Then, we introduce the Dual-step Feature Aggregation (DFA) to promote feature fusion from radar, image, and association branches by establishing aggregation priorities based on feature similarity in two steps. Experimental results on the TJ4DRadSet and View-of-Delft (VoD) datasets demonstrate that our method efficiently fuses radar and camera through virtual points and achieves state-of-the-art performance.

RA-L 2026-04-20

Cross-Entropy Optimization of Physically Grounded Task and Motion Plans

Andreu Matoses Gimenez, Nils Wilde, Chris Pek, Javier Alonso-Mora

操作与机械臂控制与动力学
摘要

Autonomously performing tasks often requires robots to plan high-level discrete actions and continuous low-level motions to realize them. Previous TAMP algorithms have focused mainly on computational performance, completeness, or optimality by making the problem tractable through simplifications and abstractions. However, this comes at the cost of the resulting plans potentially failing to account for the dynamics or complex contacts necessary to reliably perform the task when object manipulation is required. Additionally, approaches that ignore effects of the low-level controllers may not obtain optimal or feasible plan realizations for the real system. We investigate the use of a GPU-parallelized physics simulator to compute realizations of plans with motion controllers, explicitly accounting for dynamics, and considering contacts with the environment. Using cross-entropy optimization, we sample the parameters of the controllers, or actions, to obtain low-cost solutions. Since our approach uses the same controllers as the real system, the robot can directly execute the computed plans. We demonstrate our approach for a set of tasks where the robot is able to exploit the environment's geometry to move an object.

RA-L 2026-04-20

ClaRO: A Cluster-Based Method for Radar Odometry

Eike Furuno, Pranav Megarajan, Christian Kowalski, Tim C. Stratmann, Max Pfingsthorn, Andreas Hein

导航 / SLAM / 自动驾驶感知与传感
摘要

We present ClaRO, an unsupervised, cluster-based odometry pipeline for 4D imaging radar to improve robustness and reduce long-term drift for radar-only odometry. Our core idea is to utilize density-based clustering to improve ego-motion estimation using per-cluster Doppler-residuals and find a local best result to seed a robust least squares in addition to finding static points for pose estimation. The resulting global inlier mask is used to refine the pose via a weighted iterative closest point (ICP) approach that fuses Doppler and radar cross-section (RCS) cues while down-weighting the radar's weak elevation axis. The method is fully unsupervised and sensor-agnostic. We compare our method on multiple public datasets (View-of-Delft, HeRCULES, and NTU4DRadLM) and show that our approach provides results that match or exceed recent radar-only odometry baselines (Radar4Motion, EFEAR-4D) and pose graph based 4DRadarSLAM. Our cluster-wise approach significantly reduces long-term drift, achieving a 72.0% improvement in mean absolute trajectory error (ATE) and a 73.2% improvement in absolute rotation error (ARE) compared to state-of-the-art radar odometry methods on the HeRCULES dataset. While remaining competitive in relative pose estimation, our method improves global trajectory consistency and demonstrates robustness across different radar sensors and environments on the VoD and NTU4DRadLM benchmarks.

RA-L 2026-04-20

Olaf: Bringing an Animated Character to Life in the Physical World

David Müller, Espen Knoop, Dario Mylonopoulos, Agon Serifi, Michael A. Hopkins, Ruben Grandia, et al.

人形机器人机器人学习
摘要

Animated characters often move in non-physical ways and have proportions that are far from a typical walking robot. This provides an ideal platform for innovation in both mechanical design and stylized motion control. In this paper, we bring Olaf to life in the physical world, relying on reinforcement learning guided by animation references for control. To create the illusion of Olaf's feet moving along his body, we hide two asymmetric legs under a soft foam skirt. To fit actuators inside the character, we use spherical and planar linkages in the arms, mouth, and eyes. Because the walk cycle results in harsh contact sounds, we introduce additional rewards that noticeably reduce impact noise. The large head, driven by small actuators in the character's slim neck, creates a risk of overheating, amplified by the costume. To keep actuators from overheating, we feed temperature values as additional inputs to policies, introducing new rewards to keep them within bounds. We validate the efficacy of our modeling in simulation and through experiments on the robot.

RA-L 2026-04-20

FlyOrb: Spherical Terrestrial-Aerial Bimodal Vehicle

Xiao Li, Ke Huang, Hongkai Wang, Jiaying Zhang

无人机 / 空中机器人足式 / 四足机器人
摘要

Unmanned aerial vehicles perform well in aerial tasks but still face limitations due to limited endurance. In this work, we present a novel spherical terrestrial–aerial bimodal vehicle, called FlyOrb, which is capable of both rolling and flying. The vehicle integrates inertial wheel actuation with a concise mechanism based on an unactuated hinge structure, enabling quadrotor- like flight and spherical robot– like rolling. Furthermore, the mass eccentricity design allows the vehicle to be self-stabilizing on the ground but makes it difficult to initiate rolling. We propose a resonance-driven rolling strategy for activating and efficient rolling. For both slope and step, the vehicle's traversal capabilities were analyzed theoretically and tested experimentally. Thrust experiments identify the more aerodynamically efficient structure, which retains 87.2% of the thrust compared to the configuration without arm. Futher, we analyze the impact of FlyOrb's design on its flight maneuverability. Finally, flight and rolling tests validate the vehicle's multimodal locomotion capabilities. In ground mode, the average speed is 0.3505 m/s, and the power saving efficiency reaches 96.8%.

RA-L 2026-04-13

Dynamic Legged Ball Manipulation on Rugged Terrains With Hierarchical Reinforcement Learning

Dongjie Zhu, Zhuo Yang, Xuesong Li, Wenjun Xu, Qi Liu, Xiang Li

足式 / 四足机器人操作与机械臂导航 / SLAM / 自动驾驶机器人学习
摘要

Achieving reliable object manipulation while traversing complex terrains is the missing link between agile quadruped locomotion and practical autonomy. Specifically, using traditional end-to-end reinforcement learning (RL) for dynamic ball manipulation in rugged environments presents two key challenges. The first is coordinating distinct motion modalities to integrate terrain traversal and ball control seamlessly. The second is overcoming sparse rewards in end-to-end RL, which impedes efficient policy convergence. To address these challenges, we propose a hierarchical RL framework. A high-level policy, informed by proprioceptive data and ball position, adaptively switches between pre-trained low-level skills such as ball dribbling and rough terrain navigation. We further propose Dynamic Skill-Focused Policy Optimization to suppress gradients from inactive skills and enhance critical skill learning. Both simulation and realworld experiments validate that our method outperforms baseline approaches in dynamic ball manipulation across rugged terrains, highlighting its effectiveness in challenging environments.

RA-L 2026-04-13

Topology-Optimized, Dual-Phase Gripper With Force Estimation for Underwater Operation

Shuqiao Zhong, He Zheng, Yihong Yao, Fang Wan, Zhiyuan Zhou, Chaoyang Song, et al.

操作与机械臂导航 / SLAM / 自动驾驶人机交互 / 遥操作
摘要

Addressing the critical trade-off between compliance and grasping force in underwater manipulation, this paper presents a novel topology optimization framework to automate the design of soft, variable-stiffness fingers from a single material. By employing multiple load cases within the optimization objective, our framework automatically synthesizes a finger structure that preserves the adaptive Fin Ray effect while exhibiting programmed, multi-stage stiffness. We utilize this method to realize a dual-phase finger, characterized by low initial stiffness for compliant contact with a compliant object and high subsequent stiffness for secure grasping. Quantitative comparisons validate this achievement with the optimized design yielding a grasping force 3.6 times higher than that of a Fin Ray finger with comparable softness, while simultaneously achieving an adaptation 2.8 times higher than that of a high-force Fin Ray variant. To enable damage-aware teleoperation, a flex sensor is embedded within the finger structure. We establish a mapping to grasping force via a piecewise-regressed model, whose structure directly reflects the finger's dual-phase mechanical behavior. The gripper is validated through a series of underwater experiments ranging from controlled laboratory tests to a nearshore field trial. This work establishes a new pathway for designing automated, single-material soft grippers with direct applications in complex underwater environments.

RA-L 2026-04-13

CVPC: Cross-Modal Visual-Guided Point Cloud Completion

Bhanu Pratap Paregi, Vaibhav Kumar

操作与机械臂导航 / SLAM / 自动驾驶机器人学习感知与传感
摘要

Robust 3D scene understanding from partial sensor data is critical to autonomous robotic systems. Robots in unstructured environments face occlusions and narrow fields of view, requiring accurate point cloud completion for manipulation, navigation, and autonomous driving. We present CVPC, a method that fuses geometric transformers with category-level multiview visual priors to avoid per-sample image alignment. CVPC stores one diffusion-derived prototype per category and injects semantic cues through cross-modal attention to recover edges and thin structures critical for grasp planning and collision avoidance. The alignment-free design supports real-time deployment on embedded platforms. CVPC achieves state-of-the-art results on shapenet-55 [28] ( $0.73\times 10^{-3}$ Chamfer Distance-L2, $\rm{CD}\rm{-}\rm{L2}$ ), PCN [7] manipulation objects ( $6.32\times 10^{-3}$ Chamfer Distance-L1, $\rm{CD}\rm{-}\rm{L1}$ ), and KITTI-Cars [27] (0.388 Minimum Matching Distance, $\rm{MMD}$ ). Qualitative results demonstrate improved completion of chair legs, car roofs, and lamp stems relative to geometry-only baselines.

JFR 2026-05-10

The Evolution of Autonomous Systems for Planetary Cave Exploration: A Review

Sarah Swinton, Daniel Mitchell, Jamie Blanche, Euan McGookin, David Flynn

导航 / SLAM / 自动驾驶感知与传感
摘要

The exploration of Subsurface Access Points (SAPs), such as lava tubes on the Moon and Mars, has gained significant interest due to their potential as stable environments shielded from surface radiation and temperature extremes. These sites are considered high‐value targets for detecting water and signs of ancient life, and assessing their suitability as habitats for human missions. However, SAP exploration presents significant challenges, including navigating unknown and hazardous terrains, operating in low‐light conditions, and managing limited communication capabilities. Recent advances in high‐resolution imaging, Synthetic Aperture Radar, and other sensing technologies have enabled better identification and characterization of SAPs, providing critical data for potential exploration missions. This review presents a structured critical analysis of the challenges in planetary cave exploration and evaluates the state‐of‐the‐art robotic platforms which offer a cost‐effective and safe alternative to human exploration in hazardous environments, in addition to sensor technologies that aid the understanding of SAPs, such as seismic studies, geological characterization, and biosignature detection. This article emphasizes the advantages of multirobot teams in generating comprehensive data sets and improving mission resilience. By combining the unique capabilities of heterogeneous robotic systems, these teams represent a crucial step toward enabling the exploration of SAPs and advancing our understanding of planetary subsurface environments.

JFR 2026-05-10

Simulation Platforms for Underwater Robotic Applications: Architectures, Capabilities, and Research Directions

Subham Kumar Shaw, Prasanna Muppidwar, Jagadeesh Kadiyam

导航 / SLAM / 自动驾驶控制与动力学
摘要

As underwater robotics advances, simulation platforms have become essential for enhancing research, development, and operational strategies. These platforms are crucial because they lower vehicle fabrication costs, mitigate risks, and recreate intricate marine environments. This review offers a detailed examination of prominent underwater simulators, emphasizing essential factors such as environmental modeling, robot kinematics and dynamics, control systems and navigation, sensor emulation, communications and their integration with artificial intelligence and machine learning workflows. We have given special focus to hydrodynamic modeling, visual rendering, and the simulation of realistic underwater phenomena like turbidity, wave interactions, and marine habitat dynamics. The review also evaluates each simulator's effectiveness in operator training, technology validation, and planning for multi‐robot missions. By comparing their designs, advantages, limitations, and specific applications, this study aims to assist in choosing suitable simulation tools. It also outlines potential developments to improve simulation accuracy and interoperability within underwater robotics.

T-RO 2026-03-24

Safe and Agile Transportation of Cable-Suspended Payload via Multiple Aerial Robots

Yongchao Wang, Junjie Wang, Xiaobin Zhou, Tiankai Yang, Xin Zhou, Chao Xu, et al.

无人机 / 空中机器人多机器人 / 集群
摘要

Transporting a heavy payload using multiple aerial robots (MARs) is an efficient manner to extend the load capacity of a single aerial robot. However, existing planning schemes for the multiple aerial robots transportation system (MARTS) still lack the capability to generate a collision-free and dynamically feasible trajectory in real-time. Therefore, they are limited to low-agility transportation in simple environments. To bridge the gap, we propose a complete planning scheme for the MARTS, achieving safe and agile aerial transportation (SAAT) of a cable-suspended payload in complex environments. Flatness map for the motor's revolutions per minute (RPM) of the aerial robot, considering the complete kinematic constraint and the dynamical coupling between each aerial robot and payload, is derived. To improve the responsiveness for the generation of the safe, dynamically feasible, and agile trajectory in complex environments, a real-time spatio-temporal trajectory planning scheme is proposed for the MARTS. Besides, we break away from the reliance on the state measurement for both the payload and cable, as well as the closed-loop control for the payload, and integrate a fully distributed control scheme to track the agile trajectory that is robust against imprecise payload mass, non-point mass payload, wind disturbances, and communication delays. The proposed schemes are extensively validated through benchmark comparisons, ablation studies, and simulations. Finally, extensive real-world experiments are conducted on practical MARTSs containing different numbers of aerial robots with onboard computers and sensors. The result validates the efficiency and robustness of our proposed schemes for the SAAT in complex environments.

T-RO 2026-03-24

Spatial Balancing for RGB-Thermal Semantic Segmentation in Autonomous Driving: A Study From Analysis to Improvement

Haotian Li, Henry K. Chu, Yuxiang Sun

导航 / SLAM / 自动驾驶感知与传感
摘要

Semantic segmentation based on RGB-Thermal (RGB-T) data fusion has made great progress in the field of autonomous driving. However, we find that most existing RGB-T semantic segmentation methods exhibit inferior performance in image central regions, in which segmentation performance is critical for driving safety. We refer to this phenomenon as spatial bias. To discover the reason for spatial bias, we design a series of experiments. The results challenge the common knowledge that more training data lead to better segmentation performance, and reveal a close causal relationship between segmentation performance and object complexity as well as image quality. We also provide a theoretical interpretation for the causal relationship using information theory and feature space analysis. Based on the findings, we propose a Gaussian-guided regional balancing masking method to balance segmentation performance across different image regions. Moreover, we introduce a spatial-weighted loss to further enhance the overall segmentation performance. Experimental results on two public datasets demonstrate the effectiveness of our method in mitigating spatial bias and improving balanced performance.

T-RO 2026-03-24

VLN-Game: Vision-Language Equilibrium Search for Zero-Shot Semantic Navigation

Bangguo Yu, Yuzhen Liu, Lei Han, Hamidreza Kasaei, Tingguang Li, Ming Cao

导航 / SLAM / 自动驾驶感知与传感
摘要

Following human instructions to explore and search for a specified target in an unfamiliar environment is a crucial skill for mobile service robots. Most of the previous works on object goal navigation have typically focused on a single input modality as the target, which may lead to limited consideration of language descriptions containing detailed attributes and spatial relationships. To address this limitation, we propose VLN-Game, a novel zero-shot framework for visual target navigation that can process object names and descriptive language targets effectively. To be more precise, our approach constructs a 3D object-centric spatial map by integrating pre-trained visual-language features with a 3D reconstruction of the physical environment. Then, the framework identifies the most promising areas to explore in search of potential target candidates. A game-theoretic vision-language model is employed to determine which target best matches the given language description. Experiments conducted on the Habitat-Matterport 3D (HM3D) dataset demonstrate that the proposed framework achieves state-of-the-art performance in both object goal navigation and language-based navigation tasks. Moreover, we show that VLN-Game can be easily deployed on real-world robots. The success of VLN-Game highlights the promising potential of using game-theoretic methods with compact vision-language models to advance decision-making capabilities in robotic systems. The supplementary video and code can be accessed via the following link: https://sites.google.com/view/vln-game https://sites.google.com/view/vln-game .

T-RO 2026-03-24

A Magnetic Capsule for Navigation and Multi-targeted Sampling in the Gastrointestinal Tract

Ziheng Chen, Huayang Ren, Zhaokai Wang, Jingfang Han, Jiaqing Xie, Ruicheng Li, et al.

足式 / 四足机器人导航 / SLAM / 自动驾驶
摘要

Untethered capsules are capable of entering the gastrointestinal (GI) tract and collecting fluid samples containing microbial communities from specific locations, facilitating the study of chronic diseases. However, existing sampling capsules are designed for single-site sampling, making it challenging to gather samples from multiple targets. This paper reports a magnetic-driven capsule for multiple sampling within the GI tract and an on-demand magnetic-triggered fluid sampling strategy. The capsule consists of a body, a magnetic-triggered negative pressure unit, and a reservoir unit. Composed of an elastic membrane and Magnet I, the negative pressure unit controls pressure change inside the capsule cavity on demand to pump the sample by switching the magnetic field, while the embedded Magnet I also enables real-time magnetic localization for regional targeting and position tracking. The reservoir unit integrates three sampling papers for fluid absorption, two waterproof layers that maintain contamination levels below 25% to ensure reliable multi-site sampling, and a rotating arm embedded with Magnet II for posture adjustment of the sampling paper. The pumping and storage performance of the capsule was systematically evaluated and optimized. Meanwhile, the capsule, actuated by an external magnetic field, was evaluated for its active locomotion performance. Finally, the feasibility of using the capsule to perform active navigation and multi-target sampling in a porcine intestine was validated via ex vivo experiments.

T-RO 2026-03-24

From Hitch to Lift: Autonomous Cable Interlacing by Multi-UAV Teams for Aerial Grasping and Transportation

Diego S. D'Antonio, Tongshu Wu, Subhrajit Bhattacharya, David Saldaña

无人机 / 空中机器人操作与机械臂
摘要

The use of cables in aerial manipulation offers a lightweight and flexible alternative to rigid grasping mechanisms. However, achieving autonomous tying and secure transportation of objects with cables remains a significant challenge. In this work, we present a novel method for autonomously securing and transporting objects using multi-layer hitches formed in midair by a team of aerial robots. Building upon prior work on polygonal hitch formation, we extend the framework to include layered cable interlacing that increases frictional grip and enables secure object grasping. We introduce two new manipulation actions: multi-layer tying and autonomous object release, completing the pipeline for aerial grasping and transportation. We develop a capstan-based analytical model that establishes an exponential scaling law for the effect of cable layers and provides a conservative guideline. The formation algorithm operates in parallel, ensuring scalability to large teams with constant execution time. We validate our system through simulation and hardware experiments, demonstrating fully autonomous object tying, lifting, and releasing using cables alone without human intervention.

IJRR 2026-03-27

A manifold model predictive controller for agile pose trajectory tracking of an orbital space robot

Shuai Wang, Haiyan Hu, Ju Chen, Xiaodong Song, Qiang Tian

操作与机械臂控制与动力学
摘要

The control of an orbital space robot is challenging due to the strong nonlinear dynamic coupling between the floating base spacecraft and the equipped manipulator. To address this problem effectively, this paper develops a geometric control framework by identifying and exploiting the Lie group structures of the space robot. The paper shows how to formulate the system momentum evolution equations as a set of first-order ordinary differential equations. Then, it discusses the designs of the Lie-algebra proportional-integral controller and the manifold model predictive controller to perform the three-dimensional pose trajectory tracking task. For the manifold model predictive controller, the paper presents the structure-preserving direct-collocation method to enforce the discrete dynamic constraints in a finite-horizon optimal control problem. Furthermore, it presents the performance comparisons of the above two controllers in numerical simulations, and emphasizes the significance of computational accuracy and efficiency, momentum shaping and prediction horizon selection for the manifold model predictive controller, with detailed benchmarks against the classic Euclidean model predictive controller. Finally, the paper demonstrates the trajectory tracking and object capturing experiments in a three-dimensional space via an air-bearing space robot simulator.

RA-L 2026-04-22

LoD-GS: Robust and Lightweight Gaussian Splatting SLAM for Real-Time Volumetric Scene Reconstruction

Jiachen Wang, Seung-Hyun Kong

导航 / SLAM / 自动驾驶
摘要

Real-time 3D reconstruction is becoming a key enabler for robotics, mixed reality, and autonomous vehicles. Recent advances in 3D Gaussian Splatting (3DGS) have enabled high-fidelity volumetric modeling, and their integration with SLAM shows strong potential for real-time deployment. However, the substantial size of 3DGS models limits deployment on heterogeneous devices, while their rendering quality remains highly sensitive to tracking accuracy under motion blur and abrupt texture variations. In this work, we propose LoD-GS, a lightweight and robust 3DGS-SLAM framework that produces compact yet high-fidelity Gaussian scene representations for flexible deployment. LoD-GS integrates entropy-driven scene-complete volumetric mapping to improve pose quality and Gaussian initialization, a geometry-aware rendering quality optimizer that emphasizes near-field and structure-rich regions under limited optimization budgets, and a deployment-aware level-of-detail 3DGS compression module that enables adaptive resource-quality tradeoffs. Extensive experiments on public benchmarks and real-world office sequences demonstrate its effectiveness, reducing model size by up to 53.8%, increasing rendering FPS by up to 43.72%, and improving PSNR by up to 2.471 dB.

RA-L 2026-04-22

Decoupled Heuristic Multi-Vehicle Emergency Trajectory Planning for Sudden Obstacles

Dengyu Xiao, Zhenyang Zeng, Chuan Tong, Mengdie Huang, Gang Wang, Jun Luo, et al.

导航 / SLAM / 自动驾驶
摘要

The emergence of sudden obstacles can significantly reduce the feasible space and may induce locally non-convex or fragmented space, especially in densely clustered scenarios, making vehicle trajectory planning remarkably challenging. Current methods face computational bottlenecks when generating emergency trajectories under such tight real-time constraints. To address this issue, we decouple safety-critical guidance from trajectory optimization for suddenly appearing obstacles. Specifically, a novel unified nonpositivity quantification method based on vector cross-product consistency is introduced to numerically constrain non-convex regions and a heuristic risk metric is designed to guide the optimization of avoidance target. Additionally, a dynamic priority strategy is further designed to adaptively adjust the constraint dimensionality in real time, improving the success rate of emergency planning. Comparative evaluations with existing emergency planning methods demonstrate the superiority of the proposed approach in terms of success rate, planning time, and emergency trajectory length. Finally, several real-world multi-vehicle experiments validate the effectiveness and practical applicability of the proposed method.

RA-L 2026-04-22

Multi-Priority Reactive Motion Control for Safe and Coordinated Dual-Arm Manipulation in Dynamic Environments

Jichuan Yu, Jizhou Yan, Zhao Jin, Chuxiong Hu, Ze Wang

操作与机械臂
摘要

Reactive motion generation for dual-arm robotic systems is challenging due to their high degrees of freedom, nonlinear characteristics as well as the presence of multiple constraints, including kinematic limits, collision avoidance, dual-arm coordination, and other task-specific requirements. These constraints may become incompatible especially in dynamic operational scenarios. This paper presents a multi-priority reactive motion control framework to address the above challenges. First, a novel time-varying control barrier function leveraging multi-body distance blending is proposed to formulate dynamic whole-body collision avoidance constraint. Then, a constraint prioritization mechanism is introduced to incorporate multiple task objectives into a single optimization-based controller, where the constraints are resolved in strict order of priority using hierarchical quadratic programming. The proposed control framework is extensively validated in both simulations and real-world experiments, with results consistently demonstrating its ability to generate both safe and coordinated reactive motions across various dual-arm collaboration tasks.

RA-L 2026-04-27

Field Validation of Prior-Based Image Compression for Tetherless Operation of Underwater Remotely Operated Vehicles

Luyuan Peng, Yuen Min Too, Mandar Chitre, Hari Vishnu, Bharath Kalyan, Rajat Mishra, et al.

摘要

Efficient visual communication is critical for tetherless operation of underwater remotely operated vehicles, where acoustic links severely constrain bandwidth. Prior work introduced NVSPrior, which uses novel view synthesis with 3D Gaussian Splatting to encode scene priors, together with iNVS, a gradient-based refinement strategy for improving reconstruction quality. However, its performance degrades in real-world environments due to turbidity, lighting variability, and dynamic scene elements. This paper presents a systematic field evaluation of NVSPrior+iNVS in turbid natural waters using ROV trials off St. John's Island, Singapore. To improve robustness, we introduce iNVS-w, which combines a DFNet-inspired pose regressor with a perceptual refinement loss. Benchmarking against classical and learned codecs shows that iNVS-w achieves substantially lower bitrate than scene-agnostic baselines while maintaining high perceptual fidelity on realistic field imagery. Ablation studies further quantify the role of initialization, loss functions, and feature extractors. These results provide a field-based assessment of prior-based image compression and identify practical modifications needed for robust operation in bandwidth-constrained underwater inspection.

RA-L 2026-04-27

Stiffness Map Generation for Soft Materials Using Axis-Aligning Non-Contact Measuring Device

Taiki Yamaguchi, Makoto Kaneko, Kensuke Harada

摘要

This paper presents a novel non-contact sensing device for constructing stiffness distribution maps on soft-material surfaces. Since directly contacting and measuring each point is inefficient, the device applies an air jet to the surface to induce deformation, and the resulting displacement is measured with a laser displacement sensor. By employing a confocal laser displacement sensor and aligning illumination and collection on the same optical axis, the actuation point of the jet and the sensing point coincide, enabling stable measurements even on tilted or locally rough surfaces. The device is compact and lightweight, allowing it to be mounted on the wrist of a robot. We further introduce an estimation algorithm that quantitatively evaluates stiffness in both the normal and tangential directions of the surface. Experimental results demonstrate that, unlike conventional contact-based methods, the proposed approach can rapidly generate stiffness maps with sufficient accuracy even on elastic surfaces whose stiffness varies spatially.

RA-L 2026-04-09

LILAC: Language-Conditioned Object-Centric Optical Flow for Open-Loop Trajectory Generation

Motonari Kambara, Koki Seno, Tomoya Kaichi, Yanan Wang, Komei Sugiura

操作与机械臂机器人学习感知与传感
摘要

We address language-conditioned robotic manipulation using flow-based trajectory generation, which enables training on human and web videos of object manipulation and requires only minimal embodiment-specific data. This task is challenging, as object trajectory generation from pre-manipulation images and natural language instructions requires appropriate instruction-flow alignment. To tackle this challenge, we propose the flow-based Language Instruction-guided open-Loop ACtion generator (LILAC). This flow-based Vision-Language-Action model (VLA) generates object-centric 2D optical flow from an RGB image and a natural language instruction, and converts the flow into a 6-DoF manipulator trajectory. LILAC incorporates two key components: Semantic Alignment Loss, which strengthens language conditioning to generate instruction-aligned optical flow, and Prompt-Conditioned Cross-Modal Adapter, which aligns learned visual prompts with image and text features to provide rich cues for flow generation. Experimentally, our method outperformed existing approaches in generated flow quality across multiple benchmarks. Furthermore, in physical object manipulation experiments using free-form instructions, LILAC demonstrated a superior task success rate compared to existing methods. The project page is available at https: //lilac-75srg.kinsta.page/.

RA-L 2026-04-09

Open-Set Tactile Recognition Using Regression of Mechanical Properties

Pakorn Uttayopas, Xiaoxiao Cheng, Jonathan Eden, Etienne Burdet

操作与机械臂导航 / SLAM / 自动驾驶感知与传感人机交互 / 遥操作
摘要

Tactile exploration enables robots to acquire rich mechanical information about objects through physical interaction and enhances their perception and manipulation in unstructured environments. However, existing tactile object recognition methods are limited by closed-set assumptions, recognizing only predefined categories and relying heavily on labeled data. This limits their ability to handle unseen objects. Here we propose an open-set tactile recognition approach that broadens the ability of robots to identify known and novel objects. The approach leverages the intrinsic mechanical properties of objects, estimated online through haptic interaction. It integrates supervised learning for recognizing known objects with unsupervised clustering for characterizing novel ones. The inclusion of new objects is enabled by regression-based mechanical property estimation combined with distance-based online clustering. This allows robots to generalize effectively and extract reliable features from unseen objects, improving perception in dynamic environments. Validation on a 20 object dataset with diverse mechanical characteristics shows the efficacy of the approach, achieving a 96.02 $\pm$ 1.69% recognition rate for 12 known objects and detecting 8 novel objects with a 90.79 $\pm$ 5.45% accuracy. After integrating the newly identified objects into the robot's knowledge base, the approach achieves an Adjusted Rand Index of 0.701 $\pm$ 0.096, confirming effective clustering. These results show the potential of the proposed method to advance open-set tactile perception and support more adaptive robot interaction in real-world settings.

RA-L 2026-04-09

Hippo: H igh-Performance I nterior- P oint and P rojection-Based Solver for Generic Constrained Trajectory O ptimization

Haizhou Zhao, Ludovic Righetti, Majid Khadiv

足式 / 四足机器人操作与机械臂导航 / SLAM / 自动驾驶
摘要

Trajectory optimization is the core of modern model-based robotic control and motion planning. Existing trajectory optimizers, based on sequential quadratic programming (SQP) or differential dynamic programming (DDP), are often limited by their slow computation efficiency, low modeling flexibility, and poor convergence for complex tasks requiring hard constraints. In this paper, we introduce Hippo , a solver that can handle inequality constraints using the interior-point method (IPM) with an adaptive barrier update strategy and hard equality constraints via projection or IPM. Through extensive numerical benchmarks, we show that Hippo is a robust and efficient alternative to existing state-of-the-art solvers for difficult robotic trajectory optimization problems requiring high-quality solutions, such as locomotion and manipulation.

RA-L 2026-04-09

ShelfAware: Real-Time Visual-Inertial Semantic Localization in Quasi-Static Environments With Low-Cost Sensors

Shivendra Agrawal, Jake Brawer, Ashutosh Naik, Alessandro Roncone, Bradley Hayes

导航 / SLAM / 自动驾驶感知与传感医疗 / 软体 / 微纳人机交互 / 遥操作
摘要

Many indoor workspaces are quasi-static : their global geometric layout is stable, but local semantics change continually, producing repetitive geometry, dynamic clutter, and perceptual noise that defeat standard vision-based localization. We present ShelfAware , a semantic particle filter for robust global localization that treats scene semantics as statistical evidence over object categories rather than fixed quantity landmarks. ShelfAware fuses a depth likelihood with a category-centric semantic similarity and uses a precomputed bank of semantic viewpoints to perform inverse semantic proposals inside Monte Carlo Localization (MCL), yielding fast, targeted hypothesis generation on low-cost, vision-only hardware. To demonstrate perception-agnostic scalability, we evaluate ShelfAware across two domains. In a rigorously controlled mock retail environment, ShelfAware achieves a 97% global localization success rate, maintaining the highest tracking success (66%) across cart, wearable, and dynamic occlusion conditions. Furthermore, in a 3,500 sq. ft. operational grocery store leveraging an open-vocabulary vision pipeline, ShelfAware significantly outperforms both geometric and fixed-quantity semantic baselines. By modeling semantics distributionally and leveraging inverse proposals, ShelfAware resolves geometric aliasing, providing an infrastructure-free building block for mobile and assistive robots in dynamic real-world environments.

RA-L 2026-04-09

Concurrent Learning With Triangle-Based Cooperative Correction for Multi-Robot Relative Localization and Formation Control

Chuanhai Yang, Jingyi Huang, Qingshan Liu

导航 / SLAM / 自动驾驶机器人学习多机器人 / 集群
摘要

This letter presents a framework for cooperative relative localization and formation control of multi-robot systems in GPS-denied environment. First, a concurrent learning-based scheme with sliding-window sampling strategy is developed to exploit historical information, relaxing the strict persistent excitation requirement to a finite excitation condition. Second, a novel triangle-based cooperative correction mechanism is introduced to mitigate the accumulation of local biases, thereby significantly enhancing estimation accuracy. Building upon the precise localization, a composite controller is designed to achieve formation maintenance and inter-robot collision avoidance simultaneously. Extensive simulations and physical experiments demonstrate the effectiveness of the proposed approach, demonstrating high localization accuracy, robust formation control, and strong resilience under realistic conditions.

RA-L 2026-04-09

TerAdapt: Proprioceptive Terrain-Adaptive Locomotion via Codebook Aligned Representation Learning

Yubiao Ma, Han Yu, Kai Guo, Chongming Chen, Wuwei Huang, Boyang Xing, et al.

人形机器人足式 / 四足机器人控制与动力学
摘要

Humanoid robots aim to achieve human- like locomotion in unstructured environments. However, designing a controller for such robots is highly challenging due to their inherent instability and the requirement to adapt to diverse terrains. To address this problem, we present TerAdapt , a proprioceptive terrain-adaptive locomotion framework that learns semantically meaningful terrain representations and delivers terrain-aware gait modulation directly from onboard proprioception. TerAdapt achieves this through Terrain Codebook Alignment (TCA), which discretizes elevation maps into a compact terrain codebook and aligns these semantic terrain tokens with a latent representation inferred purely from proprioceptive history. This alignment enables the policy to infer terrain categories and generate adaptive gaits solely from onboard proprioception. Extensive experiments in simulation and on the Unitree G1 humanoid robot demonstrate that TerAdapt achieves state-of-the-art performance among proprioceptive methods, delivering robust and adaptive locomotion across challenging terrains without any exteroceptive sensing.

RA-L 2026-04-09

VectorGlide: Realizing Continuum Drive Paradigm via Integrated Contact Kino-Dynamic MPC for Wheeled Quadruped Robots

Zhihao Zhang, Fei Meng, Maosen Wang, Botao Liu, Zhicheng Yuan, Wenyun He, et al.

足式 / 四足机器人导航 / SLAM / 自动驾驶医疗 / 软体 / 微纳控制与动力学
摘要

This paper addresses the inherent inefficiencies and limited agility of traditional gait-based locomotion by proposing the continuum drive paradigm as a high-performance alternative. Existing control frameworks are limited by their simplified predictive models and decoupled planners, which prevent them from utilizing coupled whole-body dynamics, managing system redundancy, and planning stance-wheel motion. To solve these issues, we propose a unified framework built upon our novel VectorGlide kinematic strategy. This strategy enables agile, hardware-free steering by leveraging coordinated whole-body posture adjustments to satisfy the commanded Instantaneous Center of Rotation (ICR) geometry. The practical realization of this strategy is achieved through the Integrated Contact Kino-Dynamic Model Predictive Control (ICKD-MPC), which formulates the entire motion generation task as a single Optimal Control Problem (OCP) solved in real-time. Experimental results demonstrate the superiority of our framework, achieving up to a 24× improvement in tracking precision and a 50-75% enhancement in energy efficiency (Cost of Transport) compared to traditional gait-based locomotion.

RA-L 2026-04-09

Graph-Based Reinforcement Learning for Robot Decision Making in Collaborative Robotics

Martina Pelosi, Margherita Cosenza, Andrea Maria Zanchettin, Paolo Rocco

操作与机械臂机器人学习人机交互 / 遥操作
摘要

Human-Robot Collaboration (HRC) is increasingly becoming a core element in modern industrial automation, as it enables the flexibility needed to meet diverse and rapidly changing production demands. However, a key challenge lies in combining production efficiency and flexibility. Robot decision-making in HRC should not only minimize production time and costs, but also adapt to heterogeneous assembly scenarios and unpredictable human actions. This paper addresses these challenges with a Graph Convolutional Network (GCN)-based decision-making framework trained through a Reinforcement Learning (RL) procedure on a randomized set of assembly processes. The proposed RL-GCN optimizes long-term assembly efficiency by dynamically assigning robot tasks in real time to adapt to human choices. Also, an online fine-tuning stage customizes the model weights to the specific assembly, further enhancing performance and supporting real deployment. Extensive offline simulations and real-world experiments, including scenarios with dynamically changing assembly structures, demonstrate that the proposed method improves assembly efficiency while maintaining the flexibility required for robust HRC.

RA-L 2026-04-20

Variable Stiffness Spring Leg With Human-Driven Stiffness Adaptation

Tiange Zhang, David J. Braun

医疗 / 软体 / 微纳
摘要

The bicycle transmission enables effortless gear shifting to amplify the force produced by human limbs, but it cannot amplify limb power. Conventional variable stiffness springs, placed in series or parallel with human limbs, can amplify both force and power by increasing stiffness. However, amplifying power requires energy proportional to the amount of energy stored in the spring. Here, we present a human-driven robot leg that functions as an energetically conservative variable stiffness spring. Using a pedaling mechanism, the leg stores energy supplied by the human and allows stiffness to be increased before the stored energy is released. This approach amplifies both force and power, analogous to shifting gears while pedaling a bicycle. The prototype confirms force amplification, power amplification via stiffness modulation, and low-cost stiffness adjustment, demonstrating the potential for human-driven robot exoskeletons that employ a pedaling mechanism to enable human energy input and exceed biological force and power limits during energy release.

RA-L 2026-04-20

Calibration of Error Distributions in Robot Kinematics for Increased Precision in Manipulation Tasks

Tim Gerstewitz, Peter Lehner, Lukas Burkhard, Natalija Topalovic, Alin Albu-Schäffer, Daniel Leidner, et al.

操作与机械臂
摘要

The accuracy of robotic forward kinematics is commonly improved by calibration. However, most calibration methods only take deterministic errors, such as inaccurate geometry and unknown stiffnesses, into account and neglect errors with stochastic characteristics, including joint friction, gear backlash and component wear. In order to incorporate these effects, this paper presents a calibration procedure for a probabilistic forward kinematics model which identifies both systematic errors and error sources whose combined effects are modeled as distributions within a kinematic chain. Additionally, we present a method for using the identified distributions to enhance task-relevant accuracy by minimizing the end-effector uncertainty resulting from such distributions in a task-specific way. Finally, this idea is demonstrated in an experiment with a research rover tasked with picking up a payload box from a lander. Here, we show that end-effector error in task-relevant directions can be reduced by 40% by choosing low-uncertainty over high-uncertainty configurations.

RA-L 2026-04-20

Self-Reconfiguration Planning for Deformable Quadrilateral Modular Robots

Jie Gu, Hongrun Gao, Zhihao Xia, Yirun Sun, Chunxu Tian, Dan Zhang

控制与动力学
摘要

While deformable modular self reconfigurable robots offer enhanced reconfiguration flexibility, strict kinematic constraints present complex self reconfiguration planning challenges. This letter presents a novel self-reconfiguration planning algorithm for deformable quadrilateral MSRRs. The method first constructs feasible connect/disconnect actions using a virtual graph representation, and then organizes these actions into a valid execution sequence through a Dependence-based Reverse Tree (DRTree) that resolves interdependencies. We also prove that reconfiguration sequences satisfying motion characteristics exist for any pair of configurations with seven or more modules (excluding linear topologies). Finally, comparisons with a modified BiRRT algorithm highlight the superior efficiency and stability of our approach, while deployment on a physical robotic platform confirms its practical feasibility.

RA-L 2026-04-20

Autonomous Navigation in Unstructured Environments: A Probabilistic Approach for Generating Local Guidance With Limited Prior Information

Yafeng Bu, Zhenping Sun, Binhan Du, Yunfei Xie, Xiaohui Li, Hui Shen

导航 / SLAM / 自动驾驶
摘要

Generating reliable local guidance information is crucial for autonomous navigation in unstructured environments with limited prior information. Conventional approaches often fuse navigation cues with different physical semantics into a single scalar objective through manually weighted cost terms, making the contribution of each cue difficult to interpret and often requiring scenario-specific parameter tuning. To address this issue, we propose a probabilistic local reference path generation method that uniformly models environmental traversability, goal direction, kinematic preference, and temporal consistency as angular distributions in a polar coordinate system. Their agreement with environmental evidence is quantified by the Bhattacharyya Coefficient to derive adaptive fusion weights. On top of the fused posterior, anchor inference is formulated as a dynamic-programming-based sequence optimization over selected radial layers, enabling real-time generation of smooth local reference paths in continuous space. We conducted simulation experiments on the BARN dataset, where the proposed method significantly outperforms baseline methods in terms of task completion rate, while maintaining competitive performance in metrics such as path length, smoothness, and safety. Furthermore, we conducted real-world vehicle experiments, including autonomous driving trials that covered 3.2 km in real off-road scenarios, which further validate the effectiveness and practicality of the method under limited prior information.

RA-L 2026-04-20

A Motion Control Algorithm of Wheel-Legged Robot via Adaptive Dynamic Programming and Data-Driven Value Iteration

Xuefei Liu, Wai Tuck Chow, Youzhi Xu, Yi Sun, Andong Jiang, Ping Zhang, et al.

足式 / 四足机器人
摘要

To enhance the motion adaptability of wheel legged robot in complex environments, a novel data-driven value iteration (VI) control algorithm based on adaptive dynamic programming (ADP) is proposed. A state-space system model of the wheel-legged robot is established, from which the algebraic Riccati equation containing the weighting matrices is derived based on the Hamilton-Jacobi-Bellman equation. The ADP framework is employed to collect state variable data under external disturbance, and an offline VI-based iteration learning process is conducted to obtain a set of optimal gain matrices. These gain matrices are subsequently integrated with multi-subtask motion control and torque allocation strategy to construct a unified control framework. Finally, a series of ex perimental evaluations, including dynamic balance, disturbance robustness, and stair-crossing, are conducted. The experimental results demonstrate that the proposed algorithm exhibits strong adaptability and superior disturbance robustness across complex and unstructured terrains.

RA-L 2026-04-20

Towards Robust Optimization-Based Autonomous Dynamic Soaring With a Fixed-Wing UAV

Marvin Harms, Jaeyoung Lim, David Rohr, Friedrich Rockenbauer, Nicholas Lawrance, Roland Siegwart

无人机 / 空中机器人
摘要

Dynamic soaring is a flying technique to exploit the energy available in wind shear layers, enabling potentially un limited flight without the need for internal energy sources. We propose a framework for autonomous dynamic soaring with a fixed-wing UAV. The framework makes use of an explicit representation of the wind field and a classical approach for guidance and control of the UAV. Robustness to wind field estimation error is achieved by constructing point-wise robust reference paths for dynamic soaring and the development of a robust path following controller for the fixed-wing UAV. Wind estimation and path tracking performance are validated with real flight tests to demonstrate robust path-following in real wind conditions. In simulation, we demonstrate robust dynamic soaring flight subject to varied wind conditions, estimation errors and disturbances. Together, our results strongly indicate the ability of the proposed framework to achieve autonomous dynamic soaring flight in wind shear.

RA-L 2026-04-20

A Discrete Variable Stiffness Actuator for Robotic Hand

Jiahang Zhu, Ke Shi, Tongshu Chen, Maozeng Zhang, Aiguo Song

操作与机械臂
摘要

Variable stiffness actuators can dynamically adjust output stiffness to accommodate varying task demands, making them well suited for grasping applications. Nevertheless, conventional variable stiffness actuators often suffer from bulky structures, slow response, and complex control, which hinders their integration into robotic hands. To overcome these limitations, we designed a discrete variable stiffness actuator (DVSA) based on electrostatic clutches (ESCs). The DVSA comprises multiple parallel units, each with a linear spring in series with an ESC. By selectively activating ESCs, the DVSA couples different springs to achieve discrete stiffness modulation. This design provides a compact structure and fast response. Furthermore, a cable-driven robotic hand was developed with the DVSA integrated into the palm. A grasping control strategy was also designed to estimate object stiffness in real time and adjust the stiffness of the DVSA accordingly. The performance of the DVSA and the robotic hand was evaluated experimentally. The results indicate that the robotic hand demonstrates excellent stiffness modulation performance under both static and dynamic grasping conditions.

RA-L 2026-04-20

Enabling Crab Driving for Rear Steering-Limited Vehicles via Coordinated Direct Yaw Moment Control and Steering Allocation

Xin Wang, Guoxiang Lu, Baoqi Ma, Fangyin Tian, Yifa Liu, Di Liu

控制与动力学
摘要

Crab driving has significant application potential in complex driving conditions such as maneuvering in confined spaces, high-speed lane changes, and emergency obstacle avoidance. However, its practical application is impeded by small rear-wheel steering angles in mass-production vehicles. To overcome this limitation, we propose a hierarchical control framework that coordinates direct yaw moment with front-rear wheel steering. This framework actively compensates for undesired yaw motion and ensures accurate sideslip angle tracking, thereby enabling large-angle crab driving for vehicles with limited rear steering angles. The stability of the proposed controller is theoretically analyzed via Lyapunov theory. Simulations validate the proposed method's effectiveness and robustness, demonstrating a maximum crab driving angle of $\pm {40}^{\circ }$ under a strict $\pm {10}^{\circ }$ rear-wheel steering angle limit and confirming the method's robustness across diverse conditions.

RA-L 2026-04-20

Single-Motor-Driven Robotic Hand With Multiple Grippers Using Gravity-Driven Torque-Path Switching Mechanism

Toshihiro Nishimura, Hono Okatsu, Tetsuyou Watanabe

操作与机械臂
摘要

This paper presents a novel multi-gripper robotic hand system driven by a single motor. To achieve the operation of multiple grippers through a single motor, this study develops a new transmission mechanism that switches the torque path to each gripper using magnetic coupling and gravity. The motor torque is transmitted to a target gripper via the magnetic couplings, and switching the target gripper is triggered by magnetic decoupling induced by intentional overtorque. Once decoupled, the pendulum-suspended motor swings under gravity to align with the next target gripper, achieving autonomous switching without additional actuators. In this paper, theoretical analysis of the magnetic couplings and design considerations to achieve the desired behavior are presented. Two prototypes—a dual-gripper hand and a five-gripper hand—were developed to validate the proposed concept. Quantitative evaluations demonstrate that the proposed system achieves a switching duration of 0.41 s with high repeatability, as well as robust and durable operation. Grasping experiments demonstrate versatile object handling and sequential multi-object manipulation using selective gripper switching.

RA-L 2026-04-20

The DBCF-EM Gripper: Using Dual-Belt Curved-Flexure Eversion Mechanism Fingers for Confined-Space Robotic Grasping

A. E. Huisjes, J. H. B. Friederich, J. L. Herder

操作与机械臂
摘要

Conventional fingered grippers often struggle in confined spaces because limited lateral access prevents finger insertion and inward closing. This paper presents the DBCFEM gripper, whose fingers combine a base-driven curved flexure, prescribing a tangential object-following trajectory, with a dualbelt eversion system that creates near-stationary contact surfaces and reduces sliding at the contact interfaces. This enables a lowdisturbance caging grasp strategy in which the fingers propagate along the object surface rather than closing perpendicularly toward it. A prototype gripper was built for robotic tomatoremoval experiments from a crate. Experiments showed contour following with a maximum deviation of 3 mm, negligible normal disturbance of at most 0.1 N, and a 91–97 reduction in tangential disturbance forces. In robotic trials, the gripper achieved 100% pick-up success and a 91% damage-free success rate, demonstrating its effectiveness for confined-space grasping.

RA-L 2026-04-20

Off-Road Navigation via Implicit Neural Representation of Terrain Traversability

Yixuan Jia, Qingyuan Li, Jonathan P. How

导航 / SLAM / 自动驾驶
摘要

Autonomous off-road navigation requires robots to estimate terrain traversability from onboard sensors and plan motion accordingly. Conventional approaches typically rely on sampling-based planners such as MPPI to generate short-term control actions that aim to minimize traversal time and risk measures derived from the traversability estimates. These planners can react quickly but optimize only over a short look-ahead window, limiting their ability to reason about the full path geometry, which is important for navigating in challenging off-road environments. Moreover, they lack the ability to adjust speed based on the terrain-induced vibrations, which is important for smooth navigation on challenging terrains. In this paper, we introduce TRAIL ( Tra versability with an I mplicit L earned Representation), an off-road navigation framework that leverages an implicit neural representation to model terrain properties as a continuous field that can be queried at arbitrary locations. This representation yields spatial gradients that enable integration with a novel gradient-based trajectory optimization method that adapts the path geometry and speed profile based on terrain traversability.

RA-L 2026-04-13

Dynamic Maclaurin-Series-Based Vision-Language-Action Model

Chong Yu, Zhongxue Gan

机器人学习感知与传感
摘要

Motivated by the Maclaurin-series theorem, a dynamic Maclaurin-series-based Vision-Language-Action (DMS-VLA) model introduces a unified approximation framework that replaces per-layer transformer weight tensors with a small set of shared Maclaurin-series templates and layer-specific coefficients, dramatically reducing memory footprint without degrading task performance. By dynamically adjusting the number of series terms per input, using target-action differencing, DMS-VLA adapts computational effort to visual and language complexity. Evaluations on the LIBERO and SIMPLER benchmarks demonstrate that DMS-VLA matches or slightly exceeds state-of-the-art VLA baselines in success rates, while real-robot trials on ALOHA and Emergen platforms confirm comparable real-world efficacy. Deployment on NVIDIA H100, RTX4090 GPUs and Jetson AGX Orin edge hardware yields up to 25%-55%× inference speedups, validating DMS-VLA as an efficient, high-quality solution for resource-constrained embodied agents.

RA-L 2026-04-13

Self-Evolved Imitation Learning in Simulated World

Yifan Ye, Jun Cen, Jing Chen, Zhihe Lu

机器人学习感知与传感
摘要

Imitation learning has been a trend recently, yet training a generalist agent across multiple tasks still requires large-scale expert demonstrations, which are costly and labor-intensive to collect. To address the challenge of limited supervision, we propose Self-Evolved Imitation Learning (SEIL), a framework that progressively improves a few-shot model through simulator interactions. The model first attempts tasks in the simulator, from which successful trajectories are collected as new demonstrations for iterative refinement. To enhance the diversity of these demonstrations, SEIL employs dual-level augmentation: (i) Model-level, using an Exponential Moving Average (EMA) model to collaborate with the primary model, and (ii) Environment-level, introducing slight variations in initial object positions. We further introduce a lightweight selector that filters complementary and informative trajectories from the generated pool to ensure demonstration quality. These curated samples enable the model to achieve competitive performance with far fewer training examples. Extensive experiments on the LIBERO benchmark show that SEIL achieves a new state-of-the-art performance in few-shot imitation learning. Code is available at https://github.com/Jasper-aaa/SEIL.git .

RA-L 2026-04-13

DG-ACMP: Deformation-Guided Motion Planning With Acceptable Contacts for Manipulators in Cluttered Environments

Yize Guo, Jiacheng Li, Qingchen Liu, Weiming Fu, Jiahu Qin, Yu Kang

操作与机械臂导航 / SLAM / 自动驾驶
摘要

In cluttered environments where rigid and deformable objects coexist, collision-free paths often do not exist. Planners that enforce collision-free trajectories therefore perform poorly by excluding feasible contact-aware trajectories. We introduce the deformation-guided acceptable-contact motion planning (DG-ACMP) framework, which enables controllable contact with deformable objects while avoiding rigid objects to enhance dexterity and feasibility. DG-ACMP employs a Kelvin-Voigt viscoelastic model for soft contact, integrated with obstacle-specific Laplacian deformation fields that approximate normal strain and mitigate local minima issues in thin-plate obstacles, in contrast to traditional SDF-based approaches. The contact model is formulated as likelihood factors within a Gaussian Process Motion Planning (GPMP) factor graph, enabling efficient trajectory optimization. Compared to contact-implicit optimization baselines, DG-ACMP reduces optimization time by up to two orders of magnitude while achieving higher success rates and lower contact forces in comparative simulations across non-convex, narrow-passage, and cluttered scenes. Real-world experiments on a Franka Emika manipulator in cluttered cabinet tasks demonstrate DG-ACMP's ability to produce gentle, velocity-regulated contacts with compliant items, succeeding where collision-free paths are infeasible.

RA-L 2026-04-13

Native-Domain Cross-Attention for Camera–LiDAR Extrinsic Calibration Under Large Initial Perturbations

Ni Ou, Zhuo Chen, Xinru Zhang, Junzheng Wang

机器人学习感知与传感
摘要

Accurate camera–LiDAR fusion relies on precise extrinsic calibration, which fundamentally depends on establishing reliable cross-modal correspondences under potentially large misalignments. Existing learning-based methods typically project LiDAR points into depth maps for feature fusion, which distorts 3D geometry and degrades performance when the extrinsic initialization is far from the ground truth. To address this issue, we propose an extrinsic-aware cross-attention framework that directly aligns image patches and LiDAR point groups in their native domains. The proposed attention mechanism explicitly injects extrinsic parameter hypotheses into the correspondence modeling process, enabling geometry-consistent cross-modal interaction without relying on projected 2D depth maps. Extensive experiments on the KITTI and nuScenes benchmarks demonstrate that our method consistently outperforms state-of-the-art approaches in both accuracy and robustness. Under large extrinsic perturbations, our approach achieves accurate calibration in 88% of KITTI cases and 99% of nuScenes cases, substantially surpassing the second-best baseline. Our code is available at https://github.com/gitouni/ProjFusion .

JFR 2026-05-10

LMBC: Low‐Power Marine Benthos Counting Framework for Underwater Robotic Real‐Time Applications

Ning Wang, Haiyan Zhao, Tao Zheng

感知与传感
摘要

Accurate and efficient marine benthos counting is vital for aquaculture management and ecological monitoring, yet it remains highly challenging in underwater environments characterized by limited visibility, cluttered backgrounds, and the constraints of low‐power robotic platforms. This paper proposes a low‐power‐deployed marine benthos counting (LMBC) framework, specifically designed for real‐time, robust, and energy‐efficient enumeration on embedded underwater robots. The LMBC framework integrates three task‐oriented modules: (i) a lightweight prediction head optimized detector (LPHOD) that enhances tiny benthos detection while maintaining low computational complexity, (ii) a confidence ranking–cascade matching tracker (CRCMT) that improves identity preservation under occlusions and fluctuating detection confidence, and (iii) an accurate classification counting module that refines final counts by filtering spurious tracks and enforcing temporal consistency. Quantitative evaluations demonstrate that the proposed LPHOD achieves a detection accuracy of 86.2% mAP@0.5 on the DUO data set, while the CRCMT attains a HOTA score of 57.5 with substantially reduced identity switches compared with representative trackers. End‐to‐end counting experiments further show that LMBC achieves a root mean square error of 13.8 and a mean absolute percentage error of 17.5%, outperforming baseline tracking‐based counting schemes. Implemented on an NVIDIA Jetson Xavier NX, the complete framework operates in real time at 13.3 FPS, validating its suitability for field‐deployed autonomous underwater robots in aquaculture and ecological monitoring scenarios.

T-RO 2026-03-30

On Solving the Differential Direct Kinematics of Planar, Spherical, Orientational, and Translational Linkages

Joseph Massin, Lionel Birglen

摘要

Twists and wrenches, as defined in screw theory, are commonly used tools to establish the input/output velocity relationship of linkages, where the output velocity is written as the product of a Jacobian matrix and the input velocity array. However, for parallel mechanisms and, more generally, mechanisms with arbitrary topologies, this Jacobian matrix is typically derived through at least one numerical matrix inversion. This step may introduce numerical errors, particularly due to inconsistent units across the matrix components. This paper presents a new method for explicitly formulating the Jacobian matrix of a broad class of linkages. This class includes all planar and spherical linkages, but also parallel mechanisms whose end-effector may only translate or rotate about a fixed point, regardless of topological complexity. Central to this method is a new matrix called the twist matrix , constructed from appropriately chosen wrenches. Using this matrix, the Jacobian of any body within the mechanism, not just the moving platform, can be derived explicitly as a combination of elementary twist matrices, independently from the mechanism's complexity.

T-RO 2026-03-24

Acceleration-Free Analytical Regressor Filtering for Robot Online Identification and Control

Tian Shi, Weibing Li, Yongping Pan

控制与动力学
摘要

Avoiding the usage of joint accelerations during robot online identification and control is significant in improving modeling and tracking accuracy. Regressor filtering is feasible to achieve acceleration-free online identification, where the robot dynamics is linearly parameterized and filtered to obtain an acceleration-free filtered regressor. Nevertheless, existing calculation methods of the filtered regressor are either applicable only to robots with low degrees of freedom (DoFs) or restricted by robot modeling techniques. We propose an acceleration-free analytical regressor filtering (AF-ARF) method to obtain the filtered regressor without restrictions on the number of DoFs or robot modeling techniques, where joint accelerations are eliminated by integration by parts, and the filtered regressor is derived by using the skew-symmetric property of the inertia matrix and some matrix operations. An acceleration-free composite learning robot control strategy based on AF-ARF is developed for exact online identification and control, where closed-loop exponential stability with parameter convergence is established under a weakened condition of interval excitation. Simulative and experimental comparisons based on a seven-DoF industrial robot have validated the superiority of our method over state-of-the-art methods in online identification, model prediction, and tracking control under reduced computing burden.

T-RO 2026-03-24

Model-Free Co-Optimization of Manufacturable Sensor Layouts and Deformation Proprioception

Yingjun Tian, Guoxin Fang, Aoran Lyu, Xilong Wang, Zikang Shi, Yuhu Guo, et al.

医疗 / 软体 / 微纳
摘要

Flexible sensors are increasingly employed in soft robotics and wearable devices to provide proprioception of freeform deformations. Although supervised learning can train shape predictors from sensor signals, prediction accuracy strongly depends on sensor layout, which is typically determined heuristically or through trial-and-error. This work introduces a model-free, data-driven computational pipeline that jointly optimizes the number, length, and placement of flexible length-measurement sensors together with the parameters of a shape prediction network for large freeform deformations. Unlike model-based approaches, the proposed method relies solely on datasets of deformed shapes, without requiring physical simulation models, and is therefore broadly applicable to diverse robotic sensing tasks. The pipeline incorporates differentiable loss functions that account for both prediction accuracy and manufacturability constraints. By co-optimizing sensor layouts and network parameters, the method significantly improves deformation prediction accuracy over unoptimized layouts while ensuring practical feasibility. The effectiveness and generality of the approach are validated through numerical and physical experiments on multiple soft robotic and wearable systems.

RA-L 2026-04-24

Correction to “Clustered Orienteering Problem With Subgroups”

Luciano E. Almeida, Cristiano Arbex Valle, Douglas G. Macharet

摘要

The authors’ affiliation in [1] was incorrectly modified in the final published version. The correct affiliation is Universidade Federal de Minas Gerais (UFMG), located in Belo Horizonte, Minas Gerais, Brazil.

RA-L 2026-04-06

FlockCounter: Automated Counting for Large-Scale Bird Flocks

Tianjiang Hu, Weiming Yan, Zijie Sun

无人机 / 空中机器人机器人学习多机器人 / 集群
摘要

Automated bird counting is crucial for ecological assessment, yet current methods often lack accuracy with large-scale flocks. This paper proposes and develops FlockCounter , a novel approach for end-to-end flock counting that relies solely on synthetic training data. Our method consists of three stages. Labeled synthetic data of large-scale flocks is automatically generated by using a collective behavioral model with only a limited set of real-world bird flock trajectories. Each annotated image is sequentially enhanced into the visual realism via style transfer. Finally, we train a customized deep learning network exclusively on synthetic data to achieve accurate bird flock counting. As demonstrated in the testing results, FlockCounter achieves state-of-the-art cross-domain performance on real-world datasets, attaining 91.3% accuracy on ground-based images and 92.3% on aerial-view images, substantially outperforming existing methods. This work provides a scalable, annotation-free solution for accurate large-scale bird flock counting, supporting efficient ecological monitoring and conservation efforts.

RA-L 2026-04-06

Marking Underwater Infrastructure: A Crab-Like Robot for Traversing Pipes

Jiyuan Jiang, John Grezmak, Kathryn A Daltorio

足式 / 四足机器人操作与机械臂导航 / SLAM / 自动驾驶医疗 / 软体 / 微纳控制与动力学
摘要

Underwater infrastructure projects are increasingly seeking robots to inspect, repair, maintain, and salvage in the ocean with minimal human diver time. To perform tasks that would otherwise require human workers, most remotely operated vehicles (ROVs) are designed to operate either while resting on bottom surfaces (which provides stability) or while suspended neutrally buoyant in the water column (which expands workspace, but requires careful thrust control for large forces or precise motion). Here, our hypothesis is that by using the legs to locomote on and grasp infrastructure substrates, stable toolpaths are generated while walking, without relying on localization or thrust actuators. To our knowledge, the robot presented here is the first legged, underwater robot capable of walking along pipes. To show repeatable, consistent toolpaths, we attach a marker that leaves continuous paths on the cylindrical surface of the pipe. Specifically, we demonstrate that our half meter-wide hexapod walks horizontally or climbs vertically on both 25.4 cm and 45.7 cm diameter pipes. Full (360°) circumferences on both pipes are completed in 75 s, and a 3.5 cm × 2.5 cm square is drawn in 3 s. Individually, these demonstrations parallel future application-specific tasks such as cutting a piling into sections for salvage, cleaning patches of pipeline for ultrasonic testing, or dislodging invasive species. Taken together, our work suggests that bio-inspired legs can provide robust robotic solutions to support ambitious undersea construction projects. Future work is expected to demonstrate value of combining closed-loop model-based control; adaptive behaviors for addressing biofouling or other pipe irregularities; underwater thrusters for initial positioning.

RA-L 2026-04-06

Sonar Mapping and Obstacle Avoidance for Autonomous Underwater Vehicles in Unknown Marine Environments

Guoshun Liu, Shanmin Zhou, Huarong Zheng, Shuo Liu, Wen Xu

导航 / SLAM / 自动驾驶感知与传感控制与动力学
摘要

This paper proposes a safe navigation framework for autonomous underwater vehicles (AUVs) that integrates sonar mapping and motion planning operating in unknown environments. The challenge is that sparse sonar data causes incomplete obstacle boundaries, posing risks to the vehicle's safe navigation. Furthermore, the AUV's nonlinear dynamics and constrained onboard computing capacity complicate real-time planning. To address these issues, Bayesian kernel inference method is first adopted to estimate occupancy in unobserved regions. Its accuracy is enhanced by an entropy-based adaptive approach that adjust the training set and kernel length. Then, a multilayer motion planner is designed, integrating global path planning with local trajectory generation. Specifically, a novel 3D path planner capable of handling AUV pitch and yaw constraints for efficient search is proposed, followed by path refinement and spatiotemporal optimization within the sonar's perception range to obtain locally optimal trajectories. The sonar mapping and obstacle avoidance motion planning are evaluated through simulations and pool experiments using a real AUV. Results demonstrate the effectiveness of the proposed underwater mapping and planning framework.

RA-L 2026-04-06

Safe Policy Optimization With Cost Practical Stability: A DQ-Learning Method

Chenyu Wang, Linkai Liu, Quan Quan

无人机 / 空中机器人机器人学习控制与动力学
摘要

Safety is a critical requirement in robotics, leading to extensive research in Safe Reinforcement Learning. While most existing methods utilize soft constraints to limit cumulative violations, robotic control scenarios demand that instantaneous safety costs also converge rapidly following a safety excursion. Inspired by practical stability from safety control, we formulate cost practical stability as a novel constraint. We introduce DQ-learning, an algorithm derived from D-learning, a differential predictive learning method ensuring system stability, to satisfy this constraint. DQ-learning merges the safety stability of D-learning with the reward optimality of Q-learning, incorporating this constraint within a policy optimization framework. Evaluations on multiple robotic safety benchmarks and a sim-to-real drone target-catching task demonstrate that DQ-learning significantly outperforms baseline methods in both constraint satisfaction and sample efficiency.

RA-L 2026-04-06

Three-Dimensional Circumnavigating Control of Unmanned Aerial Vehicles Using Range-Only Measurements

Xuancheng You, Baoli Ma, Lixia Yan

无人机 / 空中机器人导航 / SLAM / 自动驾驶控制与动力学
摘要

This letter investigates the three-dimensional (3D) circumnavigating control for a single unmanned aerial vehicle (UAV) using only range measurements in GPS-denied environments. The core challenge lies in the unavailability of bearing information, rendering traditional control laws ineffective. To formulate the control design, we propose a hierarchical motion decomposition strategy by introducing a virtual anchor, which conceptualizes the UAV's movement as a superposition of orbital and translational motions. Based on this formulation, a global circumnavigation control law is developed by fusing a nonlinear observer with the system dynamics. A unified stability analysis is presented to guarantee that the circumnavigation errors converge to zero from any initial state, without relying on the separation principle. Numerical simulations and field experiments are performed to validate the theoretical results.

RA-L 2026-04-06

Self-Supervised Multisensory Pretraining for Contact-Rich Robot Reinforcement Learning

Rickmer Krohn, Vignesh Prasad, Gabriele Tiboni, Georgia Chalvatzaki

操作与机械臂机器人学习感知与传感控制与动力学
摘要

Effective contact-rich manipulation requires robots to synergistically leverage vision, force, and proprioception. However, Reinforcement Learning agents struggle to learn in such multisensory settings, especially amidst sensory noise and dynamic changes. We propose MultiSensory Dynamic Pretraining (MSDP), a novel framework for learning expressive multisensory representations tailored for task-oriented policy learning. MSDP is based on masked autoencoding and trains a transformer-based encoder by reconstructing multisensory observations from only a subset of sensor embeddings, leading to cross-modal prediction and sensor fusion. For downstream policy learning, we introduce a novel asymmetric architecture, where a cross-attention mechanism allows the critic to extract dynamic, task-specific features from the frozen embeddings, while the actor receives a stable pooled representation to guide its actions. Our method demonstrates accelerated learning and robust performance under diverse perturbations, including sensor noise, and changes in object dynamics. Evaluations in multiple challenging, contact-rich robot manipulation tasks in simulation and the real world showcase the effectiveness of MSDP. Our approach exhibits strong robustness to perturbations and achieves high success rates on the real robot with as few as 6,000 online interactions, offering a simple yet powerful solution for complex multisensory robotic control.

RA-L 2026-04-06

BatchRRT: A Fully-Batched Reformulation of RRT With GPU-Accelerated SDF Collision Checking for Real-Time Human-Robot Collision Avoidance

Jie Liu, Meng Li, Zian Liu, Tianmei Sun

操作与机械臂导航 / SLAM / 自动驾驶感知与传感人机交互 / 遥操作
摘要

Industrial human-robot collaboration requires real-time motion planning with accurate collision avoidance in dynamic environments. While recent neural signed distance field (SDF) methods achieve sub-millisecond inference from sparse observations, traditional RRT planners fail to exploit this capability due to serial execution and linear nearest-neighbor search bottlenecks. We present BatchRRT, a fully-batched reformulation of RRT that jointly optimizes batch sampling, KD-tree acceleration, and GPU-parallel SDF collision checking. Through comprehensive experiments across 5,340 planning trials in human-robot interaction scenarios spanning single and dual-person configurations, we demonstrate 4-5× speedup over baseline RRT with consistent 99-100% success rate. Our key finding reveals that with fast neural SDFs, nearest-neighbor search—not collision detection—becomes the dominant bottleneck, with KD-tree optimization providing 15.6× acceleration. Systematic ablation studies show that these three components exhibit strong interdependence: isolated batch sampling provides only modest gains (1.1×), while their combination unlocks substantial improvements. We validate our approach on a 6-DOF industrial manipulator navigating around dynamically posed human obstacles in both reaching and bending motions, including dual-person scenarios.

RA-L 2026-04-06

Adaptive Visual-Tactile Fusion for Contact-Rich Dexterous Manipulation

Zedong Cai, Chuan Liu, Siyuan Li, Peng Liu

操作与机械臂机器人学习感知与传感控制与动力学
摘要

Effective dexterous manipulation hinges on the dynamic integration of visual context with fine-grained tactile feedback. This remains a significant challenge, as existing methods often rely on static fusion strategies and struggle to learn from isolated tactile features. To address this, we propose a hardware-decoupled tactile representation that learns cross-finger spatial features by training a sparse Transformer on unified tactile images, enabling cross-device generalization. Furthermore, we introduce a tactile-activity-guided adaptive visual-tactile fusion mechanism that dynamically adjusts the influence of vision and touch, showing the contribution of touch feedback upon physical contact. We evaluate our method on a series of contact-rich manipulation tasks requiring fine force control. Experimental results show that our method has an average success rate of over 90%, demonstrating effectiveness compared to other methods. Further analysis shows that adaptive multimodal fusion is essential to complete the dexterous manipulation tasks. The videos can be viewed at https://avtf-dex.github.io .

JFR 2026-04-27

TriRock6W: Autonomous Mobile Robot With Six Wheels, Three Rocker Arms in Complex Environments

Shaocong Wang, Ting Wang, Shiliang Shao, Cunyi Pan, Kai Zhang, Jinguo Liu

导航 / SLAM / 自动驾驶感知与传感控制与动力学
摘要

Autonomous mobile robots are increasingly used to replace humans in long‐term, high‐frequency, and repetitive tasks. However, large‐scale, high‐dynamics, and diverse terrain significantly challenge the stable application of mobile robots in complex environments. In this paper, we design and develop TriRock6W, a mobile robot for outdoor complex environments. The robot adopts a unique configuration with six wheels and three rocker‐arms that passively adapts to terrain, enabling enhanced mobility and obstacle‐crossing capability. A tightly coupled LiDAR SLAM system is developed to integrate mapping and re‐localization, where multi‐source feature registration improves robustness and a keyframe‐based feature map enhances re‐localization efficiency. Moreover, a hierarchical obstacle avoidance strategy is employed to ensure safety during navigation in highly dynamic environments. For long‐term application, we proposes a SLAM‐based adaptive PID controller that enables autonomous charger docking, assisted by a V‐shaped passive guiding mechanism for safe and stable docking process. Extensive experiments were conducted to evaluate the robot's capabilities, demonstrating reliable performance in complex environments. Notably, the robot underwent field trials over 8‐months in densely populated complex environments, validating its suitability for long‐term deployment.

RA-L 2026-04-22

Hey, Robot! Don't Cut in Line: Designing Queuing Behaviors of Delivery Robots With Multiple Stakeholders

Nayoung Kim, Hyochang Kim, Sunok Lee, Changjoo Nam

摘要

In situations where a delivery robot shares limited resources with people who are incidentally copresent in the same space, its behavior affects both those individuals and customers awaiting delivery. In this paper, we refer to such individuals as incidentally copresent persons (InCops). Most prior studies focus on a single stakeholder, leaving a gap for deployment in shared environments. We examined how robot behavior influences customers' perceived waiting time and the impressions formed by both InCops and customers, using a multi-stakeholder perspective. A key scenario involves a robot and an InCop meeting at an elevator door, where the robot performs a queuing behavior. We compared yielding, cutting in line, and first-in-first-out, along with four information types shown to customers. User journey maps captured qualitative insights. Results revealed that preferences for robot queuing behavior differed by stakeholder, with InCops responding primarily to norm compliance and customers relying more on system transparency to interpret waiting experiences. These findings suggest that socially aligned, humble conduct and visible intent are critical for delivery robots operating in resource-limited environments that balance fairness with efficiency.

RA-L 2026-04-09

Towards 3D Proprioception for Supernumerary Robotic Limbs: Design and Validation of a Mixed-Content Audio Feedback Scheme

Shufang Zheng, Min Li, Chunlei Wu, Qingqiang Wu, Jiahuan Wang, Jun Xie, et al.

导航 / SLAM / 自动驾驶感知与传感
摘要

Supernumerary robotic limbs (SRLs) are extra robotic appendages that require sensory-motor integration for intuitive control, yet most lack proprioceptive feedback. Existing approaches using vibrotactile or electrotactile cues often feel unnatural and offer limited resolution. We present a real-time spatial audio system that provides users with the 3D position of an SRL's end-effector without compromising their native motor or tactile functions. Implemented in ROS2, the system was evaluated for spatial localization error and resolution. Since SRLs are controlled through continuous motion, we prioritized dynamic performance, where the mean localization error was 14.06° in azimuth, 20.41° in elevation, and 0.078 m in distance (with a corresponding dynamic resolution of 4.42°, 2.13°, and 0.044 m). For context, the system's static spatial acuity was also high, achieving a 3AFC success rate of 93.5% in azimuth, 68.0% in elevation, and 87.0% in distance, and a static resolution of 8.43°, 2.59°, and 0.031 m. In the real robotic arm end-effector aiming test, our distance perception error was 0.137 ± 0.046 m. These results demonstrate that the proposed system provides accurate and reliable artificial proprioception for SRLs, offering a promising pathway toward more seamless control.

RA-L 2026-04-09

Minimum Distance Calculation Using Approximate Convex Decomposition for Safe Human–Robot Interaction

Yu Ren, Hui Du, Hui Zhao

感知与传感人机交互 / 遥操作
摘要

In human-robot interaction (HRI), maintaining a safe distance is the primary requirement. The flexibility and safety of HRI are often limited by the efficiency and accuracy of distance algorithms. This study proposes a fast and accurate method based on the Gilbert-Johnson-Keerthi (GJK) algorithm by optimizing the geometric representations of both humans and robots. To model the human accurately and stably, its point cloud is captured by a depth camera and decomposed into convex polyhedra using the approximate convex decomposition method with a novel area concavity. Meanwhile, the support function of the capsule is optimized to represent the robot more efficiently. A comprehensive metric is designed to jointly evaluate both the efficiency and accuracy of the algorithms. Experiments demonstrate that the area concavity remains stable under the perspective rule. Moreover, the proposed minimum distance algorithm exhibits optimal overall performance in both efficiency and accuracy while effectively preventing false alarms.

RA-L 2026-04-09

MSG: Multi-Stream Generative Policies for Sample-Efficient Robotic Manipulation

Jan Ole von Hartz, Lukas Schweizer, Joschka Boedecker, Abhinav Valada

操作与机械臂机器人学习
摘要

Generative robot policies such as Flow Matching offer flexible, multi-modal policy learning but are sample-inefficient. Although object-centric policies improve sample efficiency, it does not resolve this limitation. In this work, we propose Multi-Stream Generative Policy (MSG), an inference-time composition framework that trains multiple object-centric policies and combines them at inference to improve generalization and sample efficiency. MSG is model-agnostic and inference-only, hence widely applicable to various generative policies and training paradigms. We perform extensive experiments both in simulation and on a real robot, demonstrating that our approach learns high-quality generative policies from as few as five demonstrations, resulting in a 95% reduction in demonstrations, and improves policy performance by 89 percent compared to single-stream approaches. Furthermore, we present comprehensive ablation studies on various composition strategies and provide practical recommendations for deployment. Finally, MSG enables zero-shot object instance transfer. We make our code publicly available at https://msg.cs.uni-freiburg.de.

RA-L 2026-04-09

Robust Geospatial Coordination of Multi-Agent Communications Networks Under Attrition

Jonathan S. Kent, Eliana Stefani, Brian Plancher

无人机 / 空中机器人多机器人 / 集群
摘要

Coordinating emergency responses in extreme environments, such as wildfires, requires resilient and high-bandwidth communication backbones. While autonomous aerial swarms can establish ad-hoc networks to provide this connectivity, the high risk of individual node attrition in these settings often leads to network fragmentation and mission-critical downtime. To overcome this challenge, we introduce and formalize the problem of Robust Task Networking Under Attrition (RTNUA), which extends connectivity maintenance in multi-robot systems to explicitly address proactive redundancy and attrition recovery. We then introduce Physics-Informed Robust Employment of Multi-Agent Networks ($Φ$IREMAN), a topological algorithm leveraging physics-inspired potential fields to solve this problem. In our evaluations, $Φ$IREMAN consistently outperforms baselines, and is able to maintain greater than $99.9\%$ task uptime despite substantial attrition in simulations with up to 100 tasks and 500 drones, demonstrating both effectiveness and scalability.

RA-L 2026-04-09

ESP-SLAM: Efficient Submap Partitioning for Large-Scale 3D Gaussian Splatting-Based SLAM

Jisung Bae, Hwichang Kim, Jinwoo Choi, Ji-Hoon Hwang, Dong-Wook Kim, Kun Park, et al.

导航 / SLAM / 自动驾驶控制与动力学
摘要

Recent SLAM systems have adopted 3D Gaussian Splatting (3DGS) to generate photorealistic maps. Despite its high fidelity, scaling 3DGS-SLAM to large scenes remains challenging. In most implementations, 3DGS maintains a single globally shared set of Gaussians, which can cause catastrophic forgetting and excessive GPU memory growth under continual online optimization. Submap partitioning alleviates these issues, but heuristic split rules often trigger unnecessary splits, leading to unstable initialization and inconsistent mapping. We propose a novel 3DGS-based SLAM framework that integrates efficient submap partitioning (ESP) with a Background Submap Management (BSM) module. ESP identifies reliable split moments by monitoring Gaussian representation quality and filtering out unstable observations that may compromise mapping stability. Meanwhile, BSM refines Gaussians in the background and manages memory to support scalable deployment. Experiments on benchmark datasets demonstrate robust SLAM performance while reducing peak GPU memory usage by up to 48%. Ablation studies further validate the contribution of each component and the overall robustness of the system.

RA-L 2026-04-20

Can We Get There Faster: Tuning Sample-Based Path Planners

Chaz Cornwall, Casey Majhor, Logan Schexnaydre, Ian Mattson, Jeremy P. Bos

摘要

Sample-based path planners (SBPs) must balance sampling time and path optimality in complex domains. Without an adequate balance, SBPs will either take too long sampling or return a path with too much excess path length (EPL). Knowing and exploiting the relationship between sampling and EPL enables faster convergence to the optimal path. However, most models of this relationship are either overly restrictive or rely on indirect representations of EPL. We show a useful, direct relationship between the number of samples and EPL in the presence of sparse obstacles is a probability distribution function consisting of a binomial expansion of gamma distributions. Using simulations of SBPs, we show our proposed distribution is able to infer planner parameters from empirical data. We also present an algorithm that uses our distribution to improve the convergence of SBPs. Simulations show our algorithm reduces median path length by approximately 10% in higher dimensions without significantly reducing success rate. Github: https://github.com/chazcornwall/can_we_get_there_faster .

RA-L 2026-04-20

Parallel Mechanism-Type Skill-Assist Arm Using a Passive-State Actuator to Aid Movement of Limbs

Kengo Tanaka, Hiroaki Kozuka, Hiroshi Tachiya

摘要

Robotic arms equipped with passive-state actuators to assist beginners in performing upper-limb motions have been developed for tasks such as welding requiring precise positioning. The arms are equipped with position-controlled actuators, one of which can be switched to a passive-state actuator by turning off its excitation. When the passive-state actuator is active, the operator can directly input motion, and the robot assists in positioning along a target trajectory by coordinating with the operator's motion and selectively switching actuator excitations. A three-degree-of-freedom (3-DOF) serial-type arm using the above method has been reported in prior work. However, it exhibited issues such as motion discontinuities caused by inertial effects during excitation switching. To address these problems, this study proposes a lightweight arm based on a parallel mechanism, incorporating torque control at the passive-state actuator to achieve stable assistance. A prototype was fabricated, and experiments were conducted in which the arm assisted subjects—assumed to be beginners—in upper-limb positioning tasks. The experimental results confirmed that the proposed arm effectively assists in positioning and confirms practical feasibility.

RA-L 2026-04-13

DIJIT: A Robotic Head for an Active Observer

Mostafa Kamali Tabrizi, Mingshi Chi, Bir Bikram Dey, Kelly Yuan, Markus D. Solbach, Yiqian Liu, et al.

感知与传感
摘要

We present DIJIT, a novel binocular robotic head expressly designed for mobile agents that behave as active observers. DIJIT's unique breadth of functionality enables active vision research and the study of human-like eye and head-neck motions, their interrelationships, and how each contributes to visual ability. DIJIT is also being used to explore the differences between how human vision employs eye/head movements to solve visual tasks and current computer vision methods. DIJIT's design features nine mechanical degrees of freedom, while the cameras and lenses provide an additional four optical degrees of freedom. The ranges and speeds of the mechanical design are comparable to human performance. DIJIT attains 85% of the peak human saccade speed. Our design includes the ranges of motion required for convergent stereo, namely, vergence, version, and cyclotorsion. Here, we present DIJIT and some aspects of its performance. We also present a novel method for saccadic camera movements, using a direct relationship between camera orientation and motor values. The resulting saccadic camera movements are close to human movements in terms of their accuracy, with 1.17 $^\circ$ and 1.14 $^\circ$ mean error for the left and right cameras, respectively.

RA-L 2026-04-13

MaskGuide: Efficient Distillation for Deployable Lightweight Segmentation in Marine Environments

Xinrui Wu, Ziqiang Zheng, Yiwei Chen, Zeyu Ma, Yang Yang, Sai-Kit Yeung

感知与传感
摘要

The growing demand for efficient image segmentation in marine ecological studies is currently constrained by two key factors: the high computational requirements of models such as the Segment Anything Model (SAM) and the degraded accuracy of lightweight models in underwater environments. To overcome these challenges, we introduce a Three-stage Iterative Optimization (TIO) training framework and a decoupled knowledge distillation strategy, termed MaskGuide, both of which facilitate the development of our optimized Tiny-MSAM model on the edge device. This model significantly improves frame processing speed while maintaining high accuracy, achieving an optimal tradeoff between segmentation precision and inference speed in practical underwater and marine applications. Our research provides a feasible direction for deploying advanced segmentation models in marine and other resource-constrained scenarios. The Tiny-MSAM model, even when trained from scratch, contains only 0.5% of the parameters of the SAM model and 58% of those in MobileSAM. In existing underwater image segmentation benchmarks ( e.g. , UIIS dataset), it outperforms MobileSAM by a large margin and reaches 99.4% of the performance of the SAM ViT-H variant.

RA-L 2026-04-13

Adaptive Human-Robot Collaborative Painting Combining Preference-Based Optimization and Dynamic Motion Primitives

C. Cella, M. Ristic, M. Faroni, A. M. Zanchettin, P. Rocco

人机交互 / 遥操作
摘要

This work presents a human-centered collaborative framework that integrates Preference-Based Optimization (PBO) and Dynamic Movement Primitives (DMPs) to optimize robot-assisted tasks such as painting. The system allows the operator to perform the process while the robot adapts its behavior in real-time, dynamically adjusting the orientation of the piece in order to match the orientation of the operator’s hand. The PBO framework leverages the GLISp algorithm to iteratively refine control parameters such as execution time, robot responsiveness, and rotation amplification through human feedback. Moreover, DMPs have been modified to enhance the reactive behavior of the robot and its adaptability to ergonomic requirements. The method was validated with a heterogeneous group of participants executing painting tasks. The results show that our strategy effectively reduces operator effort while optimizing process outcomes.

RA-L 2026-04-07

GLINS: GNSS-LiDAR-INS Integrated Navigation System

Jiahui Liu, Cheng Chi, Xin Zhang, Binlin Zhang, Dongen Li, Xingqun Zhan, et al.

导航 / SLAM / 自动驾驶感知与传感
摘要

Reliable navigation in complex urban environments is challenging due to frequent GNSS signal outages and multipath interference. A practical mitigation is to incorporate LiDAR-inertial odometry (LIO) to bound drift during GNSS gaps and maintain local accuracy. However, commonly adopted scan-to-map constraints may yield inconsistencies when global corrections conflict with a static map in a tightly coupled framework. To address these issues, this letter presents GLINS, a measurement-level tightly coupled GNSS-LiDAR-INS system that features a consistent estimator design with explicit landmark states. GLINS extracts stable landmarks from the LiDAR voxel map and treats them as explicit variables in the state estimator, allowing the map representation to co-evolve with global corrections. The proposed estimator is implemented as a keyframe-based sliding-window factor graph that tightly fuses raw GNSS pseudorange, carrier-phase, and Doppler measurements with IMU preintegration and LiDAR landmark factors. A robust GNSS module performs fault detection and exclusion and carrier-phase ambiguity resolution, further improving reliability. Experiments on a public benchmark and proprietary urban driving sequences demonstrate that GLINS achieves improved estimator consistency compared to representative baselines and delivers high-precision, drift-free trajectories across diverse urban scenarios.

JFR 2026-05-04

AmphiHFW: Single Actuated Amphibious Mechanism With Undulation Fin and Leg‐Wheel Structure

Yang Tian, Shugen Ma, Takeki Ohira, Guoteng Zhang

导航 / SLAM / 自动驾驶
摘要

Amphibious robots have garnered considerable attention and interest due to their versatility in various conditions. Recent advancements in hybrid mechanisms enhance robot adaptability but increase complexity, requiring more actuators and leading to intricate control systems and reduced robustness. This paper introduces a novel approach to amphibious mechanism design that integrates an undulation fin, legs, and wheels, all operated by a single actuator. With an analysis of several simple function mechanisms, a concept of the fin‐leg‐wheel combination is proposed. Through the analysis of the fin‐wheel structure and the exploration of soft materials, a novel driving method with a simple mechanism design is proposed. Subsequently, a leg‐wheel structure is adopted to enhance the robot's capabilities. Experimental results confirm that the developed robot prototype performs robustly under various conditions. Moreover, its ability to carry payloads significantly exceeding its own weight, combined with a low cost of transport, suggests that this mechanism can be readily adapted to enhance existing amphibious vehicles with multiple wheels.

RA-L 2026-04-06

Continuous-Space Multi-Agent Path Finding via Enhanced Prioritized Search With Ackermann Kinematic Constraints

Tianyuan Zhang, Lin Zhang, Qiyu Cai, Shoukun Wang, Junzheng Wang

导航 / SLAM / 自动驾驶多机器人 / 集群
摘要

Multi-Agent Path Finding (MAPF) in complex environments remains challenging due to high computational complexity, frequent conflicts, and realistic motion constraints. Most existing methods focus on discrete spaces or idealized omnidirectional models, often neglecting or partially considering nonholonomic constraints, which limits their applicability to real-world robotic systems. This letter proposes a hierarchical continuous-space MAPF framework explicitly designed for Ackermann-steered robots, balancing computational efficiency, global coordination, and motion feasibility. In the path search layer, a spatiotemporal hybrid A* algorithm with an adaptive dynamic weighting factor improves the trade-off between computational cost and path quality, while a homotopy-group clustering mechanism provides structured agent grouping for conflict resolution. In the conflict resolution layer, a partial-order priority reconstruction and flexible priority-based dynamic adjustment strategy effectively reduce search space and conflict density. The trajectory optimization layer integrates a decentralized sequential quadratic programming method to ensure trajectory feasibility and smoothness. Comprehensive experiments, including ablation, comparative, and scalability studies, demonstrate that the proposed method achieves lower runtime, reduced execution cost, and better coordination than existing MAPF approaches, while maintaining strong scalability in high-density environments.

RA-L 2026-04-06

FAR-RIO: A Fast and Robust Radar-Inertial Odometry With Isotropic Uncertainty Model and Dual-Observation Update Pipeline

Hang Zhen, Zhi Gao, Ronghe Jin, Xinyu Guo, Feng Lin

导航 / SLAM / 自动驾驶感知与传感
摘要

Due to the ability to provide point clouds and Doppler velocity, as well as the adaptability in harsh weather conditions, 4D Radar has emerged as a new option for Simultaneous Localization and Mapping (SLAM). However, there is limited research on both robustness and computational efficiency, which are essential requirements for deploying 4D Radar as a specialized sensor in harsh weather. This paper proposes a fast and robust Radar-Inertial-Odometry (RIO) approach, named FAR-RIO. Specifically, we propose a dynamic point filtering based on full covariance propagation, along with an isotropic uncertainty model for accurate registration. In particular, leveraging the dual-observation capability of 4D radar, we design a novel dual-observation update pipeline for the iterated error-state Kalman filter (IESKF), coupled with a corresponding keyframe selection strategy, significantly reducing computational load while minimizing accuracy loss. Our method approaches state-of-the-art performance on various sensors and runs significantly faster than existing methods. Our code is available at https://github.com/heqi-zh/FAR-RIO.git .

RA-L 2026-04-06

Towards Robot Skill Learning and Adaptation With Gaussian Processes

A K M Nadimul Haque, Fouad Sukkar, Sheila Sutjipto, Cedric Le Gentil, Marc G. Carmichael, Teresa Vidal-Calleja

操作与机械臂机器人学习
摘要

General robot skill adaptation requires expressive representations robust to varying task configurations. While recent learning-based skill adaptation methods refined via Reinforcement learning (RL) have shown success, existing skill models often lack sufficient representational capacity for anything beyond minor environmental changes. In contrast, Gaussian process (GP)-based skill modelling provides an expressive representation with useful analytical properties; however, adaptation of GP-based skills remains underexplored. This paper proposes a novel, robust skill adaptation framework that utilises GPs with sparse via-points for compact and expressive modelling. The model leverages the demonstrated trajectory's first and second analytical derivatives to preserve the skill's kinematic profile. We present three adaptation methods to cater for the variability between initial and observed configurations. Firstly, an optimisation agent that adjusts the path's via-points while preserving the demonstration velocity. Second, a behaviour cloning agent trained to replicate output trajectories from the optimisation agent. Lastly, an RL agent that has learnt to modify via-points whilst maintaining the kinematic profile and faster feed-forward adaptation. Evaluated across three tasks (drawer opening, cube-pushing and bar manipulation) in both simulation and hardware, our proposed methods outperform every benchmark in success rates. Furthermore, the results demonstrate that the GP-based representation enables all three methods to attain high cosine similarity and low velocity magnitude errors, indicating strong preservation of the kinematic profile. Overall, our formulation provides a compact representation capable of adapting to large deviations from a single demonstrated skill.

RA-L 2026-04-06

Interactive Force-Impedance Control

Fan Shao, Satoshi Endo, Sandra Hirche, Fanny Ficuciello

操作与机械臂控制与动力学
摘要

Human collaboration with robots requires flexible role adaptation, enabling the robot to switch between an active leader and a passive follower. Effective role switching depends on accurately estimating human intentions, which is typically achieved through external force analysis, nominal robot dynamics, or data-driven approaches. However, these methods are primarily effective in contact-sparse environments. When robots under hybrid or unified force–impedance control physically interact with active humans or non-passive environments, the robotic system may lose passivity and thus compromise safety. To address this challenge, this paper proposes a unified Interactive Force-Impedance Control (IFIC) framework that adapts to interaction power flow, ensuring safe and effortless interaction in contact-rich environments. The proposed control architecture is formulated within a port-Hamiltonian framework, incorporating both interaction and task control ports, thereby guaranteeing autonomous system passivity. Experiments in both rigid and soft contact scenarios demonstrate that IFIC ensures stable collaboration under active human interaction, reduces contact impact forces and interaction force oscillations.

RA-L 2026-04-06

Flapping Hydrofoil Propulsion for Uncrewed Surface Vehicles: Design and Experimental Evaluation

Luca Romanello, Felix Koch, Elias Zorgati, Daniel Gebhart, Pham Huy Nguyen, Mirko Kovac, et al.

无人机 / 空中机器人足式 / 四足机器人
摘要

This study presents a novel hydrofoil-based propulsion system for water surface-operating robots, inspired by the paddling locomotion of aquatic species. Several hydrofoil designs were developed and evaluated using a custom Hydroflapper setup to assess their hydrodynamic performance. The proposed framework incorporates a camber-modulating mechanism to investigate whether dynamic camber variation can enhance thrust and propulsion efficiency. Initial simulations indicated that camber modulation increases horizontal thrust compared to symmetric hydrofoils, highlighting its potential benefits. A physical prototype was then constructed, featuring independent control of heave and pitch motions. Indoor and outdoor experiments were conducted to measure thrust generation and efficiency under various hydrofoil geometries and operating conditions, benchmarking results against conventional propeller-based systems. Experimental findings show that symmetric hydrofoils exhibit higher efficiency than propeller-driven systems in starting from rest conditions, while camber-changing hydrofoils demonstrated reduced performance at the tested scale - likely due to mechanical limitations in the actuation mechanism. Complementary CFX simulations supported these results, showing minor efficiency gains for the cambered foils and suggesting that observed inefficiencies originate from mechanical rather than hydrodynamic constraints.

RA-L 2026-04-06

GVF-MPC Depth Control Framework Motivated by Restoring-Moment Mechanism Analysis for Bionic Robotic Fish

Jiarong Han, Xiangqing Yuan, Bo Yin, Yu Liu, Suli Zou, Zhihong Deng, et al.

医疗 / 软体 / 微纳控制与动力学
摘要

Longitudinal oscillations are commonly observed in depth control of low-speed underwater platforms such as bio-inspired robotic fish. To investigate this issue, this paper conducts a quantitative analysis of the vertical-plane dynamics. It reveals the speed-dependent coupling between the depth and pitch channels as the swimming speed decreases, and clarifies the underdamped mechanism induced by restoring-moment dominance. A critical speed condition for the onset of low-speed oscillations is also derived. Based on the dynamic analysis, a hierarchical GVF–MPC depth control framework is proposed, which structurally decouples geometric guidance from dynamic execution. The guidance layer employs a guiding vector field (GVF) to generate naturally smooth attitude references from geometric information, with reduced sensitivity to model degradation and pitch disturbances. The inner loop applies model predictive control (MPC) to track the guidance commands while explicitly accounting for dynamic constraints, balancing convergence speed and control-input smoothness to improve stability and safety under low-speed conditions. Simulation and experiments are conducted to evaluate the effectiveness and robustness of the proposed method. Simulations consider sinusoidal and V-shaped depth path and compare the proposed approach with LOS, PID, and SMC based controllers. Experiments are performed in still water and under simulated Sea-State 1 disturbances to assess steady state accuracy, disturbance recovery, and overall robustness.

RA-L 2026-04-06

Encoding Material Safety Using Control Barrier Functions for Soft Actuator Control

Nicholas Pagliocca, Behrad Koohbor, Mitja Trkov

医疗 / 软体 / 微纳控制与动力学
摘要

Until recently, the concept of soft robot safety was an informal notion, often attributed solely to the fact that soft robots are less likely to damage their operating environment than rigid robots. As the field moves toward feedback control for practical applications, it becomes increasingly important to define what safety means and to characterize how soft robots can become unsafe. The unifying theme of soft robotics is to achieve useful functionality through deformation. Consequently, limitations in constitutive model accuracy and risks of material failure are inherent to all soft robots and pose a key challenge in designing provably safe controllers. This work introduces a formal definition of material safety based on strain energy functions and provides a controller that enforces it. We characterize safe and unsafe sets of an incompressible hyperelastic material and demonstrate that safety can be enforced using a high-order control barrier function (HOCBF) with quadratic program-based feedback control. As a case study, we consider material safety enforcement on theoretical models of soft actuators with a tubular geometry having inertial effects, first-order viscous effects, and full-state feedback. Simulation results verify that the proposed methodology can enforce the material safety specification.

RA-L 2026-04-06

Corrections to “Distributional Treatment of Real2Sim2Real for Object-Centric Agent Adaptation in Vision-Driven DLO Manipulation”

Georgios Kamaras, Subramanian Ramamoorthy

操作与机械臂感知与传感
摘要

In the present article, we wish to correct three minor typographic issues of our recent publication [1]. These issues affect only the referenced algorithm lines and figures, and have no influence on the methodology, discussions, and conclusions of our paper. The corrected elements are given here along with an explanation of the corresponding amendments.

RA-L 2026-04-06

Autoregressive Meta-Actions for Unified Controllable Trajectory Generation in Autonomous Driving

Jianbo Zhao, Taiyu Ban, Xiyang Wang, Qibin Zhou, Hangning Zhou, Zhihao Liu, et al.

导航 / SLAM / 自动驾驶机器人学习
摘要

Generating trajectories from high-level commands is critical for autonomous driving, but prevailing methods suffer from a flaw we term semantic misalignment. By associating long trajectories with a single, static meta-action (e.g., “lane change”), these methods corrupt training data during maneuver transitions, hindering the learning of robust command-following. To resolve this, we propose the Autoregressive Meta-Action (AMA) framework, a novel formulation that operates at a frame-wise level. It autoregressively predicts the joint distribution of the next-step meta-action and its corresponding state, ensuring strict semantic alignment at every timestep. We implement this framework using a flexible modular architecture where lightweight modules are fine-tuned atop a pre-trained foundation model. This modular design drastically reduces training costs for new commands and enables seamless switching between autonomous and command-following modes. We support our research by releasing an expanded Waymo Open Motion Dataset with dense, frame-level meta-action labels and validate our method's efficacy on real-world test vehicles.

RA-L 2026-03-31

Visual-Tactile Peg-in-Hole Assembly Learning From Peg-Out-of-Hole Disassembly

Yongqiang Zhao, Xuyang Zhang, Zhuo Chen, Matteo Leonetti, Emmanouil Spyrakos-Papastavridis, Shan Luo

操作与机械臂导航 / SLAM / 自动驾驶机器人学习感知与传感
摘要

Peg-in-hole (PiH) assembly is a fundamental yet challenging robotic manipulation task. While reinforcement learning (RL) has shown promise in tackling such tasks, it requires extensive exploration. In this paper, we propose a novel visual-tactile skill learning framework for the PiH task that leverages its inverse task, i.e., peg-out-of-hole (PooH) disassembly, to facilitate PiH learning. Compared to PiH, PooH is inherently easier as it only needs to overcome existing friction without precise alignment, making data collection more efficient. To this end, we formulate both PooH and PiH as Partially Observable Markov Decision Processes (POMDPs) in a unified environment with shared visual-tactile observation space. A visual–tactile PooH policy is first trained; its trajectories, containing kinematic, visual and tactile information, are temporally reversed and action-randomized to provide expert data for PiH. In the policy learning, visual sensing facilitates the peg–hole approach, while tactile measurements compensate for peg–hole misalignment. Experiments across diverse peg–hole geometries show that the visual–tactile policy attains 6.4% lower contact forces than its single-modality counterparts, and that our framework achieves average success rates of 87.5% on seen objects and 77.1% on unseen objects, outperforming direct RL methods that train PiH policies from scratch by 18.1% in success rate. Demos, code, and datasets are available at https://sites.google.com/view/pooh2pih .

JFR 2026-04-27

GFT‐VINS: Robust Visual–Inertial Localization via Geometric Feature Track Selection

Hui Zhao, Jiarui Dou, Jianga Shang, Kangping Ji, You Li, Yan Li, et al.

导航 / SLAM / 自动驾驶控制与动力学
摘要

Visual–inertial navigation systems (VINSs) are a cornerstone technology for autonomous robotics, leveraging visual measurements and inertial sensor data to achieve accurate state estimation through feature tracking. However, current VINS approaches often indiscriminately incorporate all feature tracks without distinguishing which ones meaningfully contribute to estimation quality. This oversight leads to suboptimal accuracy, reduced stability, and computational inefficiency. To address these limitations, we propose a novel visual–inertial framework that intelligently prioritizes informative feature tracks using normal epipolar geometry. Our method enhances state estimation by jointly optimizing feature appearance similarity and observation consistency across frames, formalized as a submodular partition optimization problem. Extensive experiments on public benchmarks demonstrate that our approach significantly outperforms five state‐of‐the‐art methods, achieving a 26.1% improvement in accuracy, 43.3% higher stability, and real‐time processing speeds of up to 125 FPS. These advancements highlight the efficacy of selective feature track utilization in overcoming the limitations of conventional VINS.

JFR 2026-04-27

Fault Tolerant Attitude Control for Spacecraft Considering Input Delay and Actuator Saturation: Theory and Experiment

Yasamin Naderi Koupai, Maryam Malekzadeh, Shahram Hadian Jazi

感知与传感控制与动力学
摘要

This study presents an advanced fault estimation and control strategy for spacecraft attitude control systems operating under time‐varying input delays and actuator saturation. An adaptive fast terminal sliding mode fault‐tolerant control (AFTSMFTC) framework is proposed to mitigate actuator faults, estimation errors, and input constraints. Two adaptive laws are designed to dynamically tune the control gains, thereby enhancing performance and robustness under fault conditions. A multipurpose sliding surface is formulated to ensure that the reaction wheels come to rest after maneuvering, improving momentum management and energy efficiency. Furthermore, a nonlinear disturbance observer (NDO) is developed to estimate external disturbances, unmodeled inertia variations, and time‐varying input delays, while an integrated fault detection mechanism identifies actuator failures in real time. The closed‐loop stability is rigorously proven using Lyapunov theory. The proposed controller and observer are validated through a high‐fidelity spacecraft attitude control simulator, with results demonstrating superior performance and robustness compared to state‐of‐the‐art methods.

JFR 2026-04-21

Visual 3D Spatiotemporal Fields‐Driven Obstacle Avoidance Using Model Predictive Control for Nursing Robots in Unstructured Environments

Guoqiang Fu, Yina Wang

导航 / SLAM / 自动驾驶机器人学习感知与传感控制与动力学
摘要

Monocular detectors typically provide 2D features, which limit a nursing robot's ability to perceive and adapt to changes in complex 3D environments. To address this challenge, this paper proposes a monocular 3D object detection framework capable of estimating objects' 3D locations, sizes, velocities, and predicted trajectories. The framework employs a Nursing Robot Convolutional Neural Network (NRCNN) to generate 2D bounding boxes for objects, followed by a 2D‐pixel‐to‐3D‐world coordinate transformation algorithm to recover their 3D spatial positions. Subsequently, an ID‐assignment‐based tracking module associates detections across sequential D435i video frames, enabling prediction of objects' spatiotemporal dynamics. To enhance 2D detection accuracy, a multibranch efficient channel attention (MECA) module is embedded within a cross‐stage partial network in the backbone of the NRCNN detector. This design enhances interchannel and cross‐layer information interactions to adaptively capture the cross‐channel and cross‐layer dimensional features. It departs from the traditional approach of improving feature extraction solely through increasingly deep dilated convolutional network structures, thereby enabling more effective feature extraction. Additionally, a bi‐directional channel spatial feature fusion pyramid network (Bi‐CSFFPN) is integrated into the neck of the NRCNN detector to fuse multi‐level, cross‐channel, and spatial features. This approach overcomes the limitation of conventional feature fusion networks that only fuse features along a single dimension and results in substantial performance improvements. Using outputs from the monocular 3D object detector, a dynamic lattice map is constructed that integrates real‐time object volume, position, and velocity information. This map allows the nursing robot to plan the shortest feasible path from its initial position to the target location while avoiding moving obstacles via a model predictive control (MPC)‐based trajectory tracking controller. Extensive experimental results demonstrate that the proposed MECA‐CSP and Bi‐CSFFPN modules significantly improve NR‐CNN detection performance and enhance downstream tasks, including 3D object localization and dynamic lattice‐map‐based MPC obstacle avoidance control.

RA-L 2026-03-30

G-MAPP: GPU-Accelerated Multi-Agent Planning and Perception for Reactive Motion Generation

Tanmay Bishnoi, Riddhiman Laha, Tobias Löw, Jose Alex Chandy, Luis F. C. Figueredo, Sami Haddadin

导航 / SLAM / 自动驾驶机器人学习感知与传感多机器人 / 集群
摘要

Reactive motion generation in unstructured environments remains an open challenge in robotics. Due to the computational complexity of collision-free motion generation, existing methods either generate global trajectories for static scenarios, or employ models that make conservative assumptions about the environment. This paper identifies the primary bottleneck as the runtime performance demand of planning on high-fidelity environments, and the temporal integration between the perception and planning modules. Therefore, we propose a framework that does not compromise on runtime performance and world representations for perception and planning by accelerating world modeling and vector-field based planning using the GPU. This allows us to achieve faster parallel state exploration for quasi-global trajectory planning, and tighter coupling of the perception-action loop in real-time for dynamic cluttered environments with off-the-shelf depth sensors. We quantitatively evaluate the computation-time and success rate differences for the CPU and GPU versions of our planner, and perform qualitative evaluations of our coupled framework using real-world experiments on a 7-DoF Franka Emika robot. Experimental results demonstrate that our GPU-based framework achieves up to a 5x speedup over the CPU version and successfully avoids collisions across both trivial and challenging physical world scenarios. The implementation is available at: https://github.com/chart-research/g-mapp

RA-L 2026-03-30

AutoTrialGen: Automated Data Generation From Few Human Demonstrations via Trajectory Annotation and Simulation Trials

Huailiang Ma, Aiguo Song, Mutian He, Mingyu Li, Yibing Yan, Linhu Wei

操作与机械臂机器人学习感知与传感
摘要

While imitation learning is a powerful paradigm for teaching robots complex manipulation skills, its effectiveness is often bottlenecked by the need for large-scale, human-collected datasets. This paper presents AutoTrialGen, an automated framework designed to generate large and diverse datasets of successful simulated demonstrations through trial-and-error, from only a few human demonstrations. Initially, AutoTrialGen employs a vision-language model to automatically decompose unprocessed human demonstrations into a library of object-centric and reusable skill primitives. Within a simulated environment, it then intelligently composes these primitives to solve new task instances, employing a novel weighted manipulability selection mechanism to ensure the quality and efficiency of the generated trajectories. Policies trained using the proposed sim-augmented data demonstrate improved performance and enhanced data efficiency across six diverse and challenging real-world manipulation tasks, compared with the baseline method.

RA-L 2026-03-30

CSC-FMT * : Efficient 3D Path Planning Via Cylindrical Space-Cutting Fast Marching Tree

Maoyun Zhao, Depeng Liu, Peng Liu, Han Wang, Guofeng Shen

导航 / SLAM / 自动驾驶感知与传感医疗 / 软体 / 微纳控制与动力学
摘要

Sampling-based path planning algorithms like the Fast Marching Tree (FMT * ) often suffer from redundant exploration and inefficient sampling in complex 3D environments. To tackle these issues, this paper proposes the Cylindrical Space-Cutting Fast Marching Tree (CSC-FMT * ) algorithm. The core of our approach is a dynamic cylindrical space-cutting mechanism. For each expanding node, it computes the central line towards the goal and leverages edge obstacle point clouds to determine the radius of the largest collision-free cylinder. This effectively prunes invalid search regions, reducing redundant sampling and costly collision checks. Coupled with a node deactivation strategy, this mechanism guides the tree expansion efficiently toward the goal. To enhance practicality for robotic execution, a post-processing pipeline incorporating B-spline smoothing and geometry-aware singularity optimization is applied, generating final trajectories that are both smooth and singularity-free. Extensive evaluations in complex 3D scenarios demonstrate that CSC-FMT * outperforms mainstream methods including PRM * , HBFMT * , and FMT * in terms of planning time, path length, and stability. The proposed algorithm also shows great potential for applications in complex medical robotic systems, such as Magnetic Resonance-guided High-Intensity Focused Ultrasound (MRI-HIFU), where planning efficiency and trajectory stability are paramount for safely navigating critical anatomical structures.

RA-L 2026-03-30

Towards Deploying VLA Without Fine-Tuning: Plug-and-Play Inference-Time VLA Policy Steering via Embodied Evolutionary Diffusion

Zhuo Li, Junjia Liu, Zhipeng Dong, Tao Teng, Quentin Rouxel, Darwin Caldwell, et al.

操作与机械臂机器人学习感知与传感
摘要

Vision-Language-Action (VLA) models have demonstrated significant potential in real-world robotic manipulation. However, pre-trained VLA policies still suffer from substantial performance degradation during downstream deployment. Although fine-tuning can mitigate this issue, its reliance on costly demonstration collection and intensive computation makes it impractical in real-world settings. In this work, we introduce VLA-Pilot, a plug-and-play inference-time policy steering method for zero-shot deployment of pre-trained VLA without any additional fine-tuning or data collection. We evaluate VLA-Pilot on both simulation and real-world experiments across distinct robotic embodiments. Experimental results demonstrate that VLA-Pilot substantially boosts the success rates of off-the-shelf pre-trained VLA policies, enabling robust zero-shot generalization to diverse downstream tasks and embodiments. Experimental videos and code are available at: https://rip4kobe.github.io/vla-pilot/ .

RA-L 2026-03-30

Learning Rollout from Sampling: An R1-Style Tokenized Traffic Simulation Model

Ziyan Wang, Peng Chen, Ding Li, Chiwei Li, Qichao Zhang, Zhongpu Xia, et al.

导航 / SLAM / 自动驾驶机器人学习多机器人 / 集群
摘要

Learning diverse and high-fidelity traffic simulations from human driving demonstrations is crucial for autonomous driving evaluation. The recent next-token prediction (NTP) paradigm, widely adopted in large language models (LLMs), has been applied to traffic simulation and achieves iterative improvements via supervised fine-tuning (SFT). However, such methods limit active exploration of potentially valuable motion tokens, particularly in suboptimal regions. Entropy patterns provide a promising perspective for enabling exploration driven by motion token uncertainty. Motivated by this insight, we propose a novel tokenized traffic simulation policy, R1Sim, which represents an initial attempt to explore reinforcement learning based on motion token entropy patterns, and systematically analyzes the impact of different motion tokens on simulation outcomes. Specifically, we introduce an entropy-guided adaptive sampling mechanism that focuses on previously overlooked motion tokens with high uncertainty yet high potential. We further optimize motion behaviors using Group Relative Policy Optimization (GRPO), guided by a safety-aware reward design. Overall, these components enable a balanced exploration–exploitation trade-off through diverse high-uncertainty sampling and group-wise comparative estimation, resulting in realistic, safe, and diverse multi-agent behaviors. Extensive experiments on the Waymo Sim Agent benchmark demonstrate that R1Sim achieves competitive performance compared to state-of-the-art methods.

JFR 2026-04-20

Combining Neural Network and RRT*: A Novel Path Planning Method for Hyper‐Redundant Manipulators With 2 N + 1 DOF

Zhe Wang, Dean Hu, Detao Wan, Chang Liu, Wei Xiao

操作与机械臂导航 / SLAM / 自动驾驶机器人学习
摘要

Path planning has long been a research focus for hyper‐redundant manipulators. However, in complex environments, the process often entails substantial computational costs. This issue is compounded by kinematic constraints. Therefore, achieving efficient path planning under these conditions remains a significant challenge. To address this issue, this paper proposes a novel path planning method for hyper‐redundant manipulators with 2n + 1 degrees of freedom (DOF), which integrates a Neural Network model with the Rapidly‐exploring Random Tree Star (RRT*) algorithm. First, a general kinematic model of the manipulator is established. Based on this model, the relationship between the manipulator's joint angles and the path corner angle is analyzed. This relationship is then incorporated into the RRT* algorithm to ensure that the generated paths comply with the kinematic constraints of the manipulator. Next, a Neural Network model is developed and trained using a dataset generated by the improved RRT* algorithm. The trained model predicts path points toward a designated goal. To address potential non‐ideal path points in the predictions, a modification strategy combining the Artificial Potential Field (APF) method and a spatial meshing strategy is introduced. This strategy employs a dual adjustment mechanism to reposition non‐ideal path points, enabling obstacle avoidance and mitigating the issue of bilateral distribution in the predicted path points. The predicted path points are then incorporated into the sampling process of the improved RRT* algorithm to guide tree expansion, thereby accelerating the path search. Additionally, the APF method is integrated into the search mechanism to further enhance planning efficiency. A series of experiments was conducted to evaluate the performance of the proposed method. The results demonstrate that the manipulator successfully reaches designated targets along the generated paths while avoiding obstacles and adhering to kinematic constraints. Compared to the RRT and RRT* algorithms, the proposed method shows superior overall performance, particularly in computation time, where it achieves a reduction of over 50% compared to RRT*. Prototype experiments further confirm the feasibility and safety of the paths generated by the proposed approach.

JFR 2026-04-19

Research on Collision Avoidance Methods for Logistics Unmanned Aerial Vehicle Based on Dynamic Controlled Interactive Collaborative Fusion

Yuetan Zhang, Honghai Zhang, Weibin Tang, Wei Pan, Chenfeng Zhang, Caiyu Dai, et al.

无人机 / 空中机器人感知与传感多机器人 / 集群
摘要

As urban air‐traffic density rises, the risk of mid‐air collisions among low‐altitude logistics drones has become a critical concern. We therefore introduce Dynamic Controlled Interactive Collaborative Fusion (DCICF), a framework that couples multimodal perception with Probabilistic Reciprocal Velocity Obstacles (P‐RVO) modeling to enable safe, cooperative flight in dynamic environments. Simulations with 50 static obstacles yielded zero collisions; in dynamic‐obstacle scenarios the collision rate was 0.083 while each drone completed an average of 2.400 missions; and in clustered multi‐drone tasks the collision rate fell to 0.043 with 2.350 missions per drone. Hardware tests further demonstrated an average decision latency of 120–150 ms, minimum separation distances of > 0.8 m indoors and > 1.2 m outdoors, and a 100% mission‐success rate. Relative to state‐of‐the‐art baselines such as DQN, MILP and RRT*, DCICF offers superior safety, real‐time responsiveness, and scalability, making it a robust solution for urban logistics drone operations.

RA-L 2026-04-09

Smooth Feedback Motion Planning With Reduced Curvature

Aref Amiri, Steven M. LaValle

导航 / SLAM / 自动驾驶
摘要

Feedback motion planning over cell decompositions provides a robust method for generating collision-free robot motion with formal guarantees. However, existing algorithms often produce paths with unnecessary bending, leading to slower motion and higher control effort. This paper presents a computationally efficient method to mitigate this issue for a given simplicial decomposition. A heuristic is introduced that systematically aligns and assigns local vector fields to produce more direct trajectories, complemented by a novel geometric algorithm that constructs a maximal star-shaped chain of simplexes around the goal. This creates a large “funnel” in which an optimal, direct-to-goal control law can be safely applied. Simulations demonstrate that our method generates measurably more direct paths, reducing total bending by an average of 91.40% and LQR control effort by an average of 45.47%. Furthermore, comparative analysis against sampling-based and optimization-based planners confirms the time efficacy and robustness of our approach. While the proposed algorithms work over any finite-dimensional simplicial complex embedded in the collision-free subset of the configuration space, the practical application focuses on low-dimensional ( $d\le 3$ ) configuration spaces, where simplicial decomposition is computationally tractable.

RA-L 2026-04-09

Center-of-Mass and Attitude Control of an Orbital Manipulator: A Novel Control Strategy

Francesco Sena, Ria Vijayan, Hrishik Mishra, Marco De Stefano

操作与机械臂
摘要

In this work, we propose a novel control strategy for the displacement of an orbital manipulator equipped with a thruster module at its end-effector. In contrast to the use of base-mounted thrusters, which often leads to allocation problems and fuel inefficiencies, the proposed approach achieves center-of-mass (CoM) maneuvers using only the end-effector thruster. The main challenge is to translate the CoM while maintaining the satellite base attitude fixed and avoiding angular momentum generation. The strategy consists of two phases: first, the manipulator orients the thruster while maintaining the base orientation via reaction wheels; second, the thruster generates an external force required to relocate the CoM while conserving attitude. This coordinated use of manipulator motion and end-effector actuation enables effective CoM control under attitude constraints. The proposed controller is validated with multi-body dynamic simulations and an Hardware-in-the-Loop robotic facility.

RA-L 2026-04-09

FlowSSC: Universal Generative Monocular Semantic Scene Completion via One-Step Latent Diffusion

Zichen Xi, Hao-Xiang Chen, Nan Xue, Hongyu Yan, Qi-Yuan Feng, Levent Burak Kara, et al.

机器人学习
摘要

Semantic Scene Completion (SSC) from monocular RGB images is a fundamental yet challenging task due to the inherent ambiguity of inferring occluded 3D geometry from a single view. While feed-forward methods have made progress, they often struggle to generate plausible details in occluded regions and preserve the fundamental spatial relationships of objects. Such accurate generative reasoning capability for the entire 3D space is critical in real-world applications. In this paper, we present FlowSSC, the first generative framework applied directly to monocular semantic scene completion. FlowSSC treats the SSC task as a conditional generation problem and can seamlessly integrate with existing feed-forward SSC methods to significantly boost their performance. To achieve real-time inference without compromising quality, we introduce Shortcut Flow-matching that operates in a compact triplane latent space. Unlike standard diffusion models that require hundreds of steps, our method utilizes a shortcut mechanism to achieve high-fidelity generation in a single step, enabling practical deployment in autonomous systems. Extensive experiments on SemanticKITTI demonstrate that FlowSSC achieves state-of-the-art performance, significantly outperforming existing baselines.

RA-L 2026-04-09

Direct Sparse Initialization for Stereo Visual-Inertial Odometry

Junyin Qiu, Jianglin Lan

导航 / SLAM / 自动驾驶
摘要

Existing stereo visual-inertial initialization methods, whether tightly or loosely coupled, rely critically on intermediate variables like feature correspondences and camera poses rather than original image data. Computing these variables through feature tracking and Structure-from-Motion (SfM) inherently introduces errors, adversely affecting results. To overcome this limitation, we propose a direct initialization method for stereo visual-inertial odometry, which directly bridges original image intensities and initial parameters, bypassing conventional intermediate variables. Specifically, we introduce a prediction function to compute the corresponding points from the initial parameters. Then we formulate an objective function that optimizes initial parameters by minimizing the photometric error of sparse points, eliminating the need for feature tracking and SfM. The metric scale in our initialization is directly determined by the stereo baseline. We further propose an approximation method for two-frame initialization, demonstrating its efficacy even with minimal frame data. Extensive experiments confirm that our method achieves superior performance in both estimation accuracy and initialization success rate with shorter runtime. Even with 3 frames for initialization, our method outperforms the state-of-the-art methods using 10 frames in most metrics.

RA-L 2026-03-27

ContractionPPO: Certified Reinforcement Learning via Differentiable Contraction Layers

Vrushabh Zinage, Narek Harutyunyan, Eric Verheyden, Fred Y. Hadaegh, Soon-Jo Chung

足式 / 四足机器人操作与机械臂机器人学习控制与动力学
摘要

Legged locomotion in unstructured environments demands not only high-performance control policies but also formal guarantees to ensure robustness under perturbations. Control methods often require carefully designed reference trajectories, which are challenging to construct in high-dimensional, contact-rich systems such as quadruped robots. In contrast, Reinforcement Learning (RL) directly learns policies that implicitly generate motion, and uniquely benefits from access to privileged information, such as full state and dynamics during training, that is not available at deployment. We present ContractionPPO, a framework for certified robust planning and control of legged robots by augmenting Proximal Policy Optimization (PPO) RL with a state-dependent contraction metric layer. This approach enables the policy to maximize performance while simultaneously producing a contraction metric that certifies incremental exponential stability of the simulated closed-loop system. The metric is parameterized as a Lipschitz neural network and trained jointly with the policy, either in parallel or as an auxiliary head of the PPO backbone. While the contraction metric is not deployed during real-world execution, we derive upper ContractionPPO TumblerNet RMA PPO Fig. 1: Comparison of trajectories for quadruped handstand using Con bounds on the worst-case contraction rate and show that these bounds ensure the learned contraction metric generalizes from simulation to real-world deployment. Our hardware experiments on quadruped locomotion demonstrate that ContractionPPO enables robust, certifiably stable control even under strong external perturbations. Videos of experiments are available at https://contractionppo.github.io/.

RA-L 2026-03-27

Flex: End-to-End Text-Instructed Visual Navigation From Foundation Model Features

Makram Chahine, Alex Quach, Alaa Maalouf, Tsun-Hsuan Wang, Daniela Rus

无人机 / 空中机器人导航 / SLAM / 自动驾驶机器人学习感知与传感
摘要

End-to-end learning directly maps sensory inputs to actions, creating highly integrated and efficient policies for complex robotics tasks. However, such models often struggle to generalize beyond their training scenarios, limiting adaptability to new environments, tasks, and concepts. In this work, we investigate the minimal data requirements and architectural adaptations necessary to achieve robust closed-loop performance with vision-based control policies under unseen text instructions and visual distribution shifts. Our findings are synthesized in Flex ( F ly- lex ically), a framework that uses pre-trained Vision Language Models (VLMs) as frozen patch-wise feature extractors, generating spatially aware embeddings that integrate semantic and visual information. We demonstrate the effectiveness of this approach on a quadrotor fly-to-target task, where agents trained via behavior cloning on a small simulated dataset (with zero real-world images) successfully generalize to real-world scenes with diverse novel goals and command formulations.

RA-L 2026-03-27

ROAD : R esponsibility- O riented Reward Design for Reinforcement Learning in A utonomous D riving

Yongming Chen, Miner Chen, Qunyi Zhang, Liewen Liao, Mingyang Jiang, Xiang Zuo, et al.

导航 / SLAM / 自动驾驶机器人学习感知与传感
摘要

Reinforcementlearning (RL) in autonomous driving employs a trial-and-error mechanism, enhancing robustness in unpredictable environments. However, crafting effective reward functions remains challenging, as conventional approaches rely heavily on manual design and demonstrate limited efficacy in complex scenarios. To address this issue, this study introduces a responsibility-oriented reward function that explicitly incorporates traffic regulations into the RL framework. Specifically, we first use a vision language model (VLM) to determine the liability in traffic collisions and propose reward signals. To mitigate VLM hallucination and ensure regulatory grounding, we propose a Traffic Regulation Knowledge Graph (TRKG) that structures unstructured traffic laws into a queryable ontology of driving scenarios and liability standards. This mechanism retrieves the precise regulatory context required to quantify accident responsibility (primary/shared/secondary), which is then used to modulate the crash penalty in the reward function. Building on these liability-informed signals, we construct a reward function and employ it to train the agent, thereby encouraging behavior that more faithfully adheres to traffic regulations. Experimental validations show that our approach achieves task success rates of 73.2% (intersection) and 54.0% (roundabout). Compared with the original policy, the success rate improves by +8.2 pp/+11.2 pp, while the ego vehicle's primary-liability share decreases by 13.5 pp/5.7 pp. The code for this work is available at https://github.com/Songan-Lab/ROAD .

RA-L 2026-03-26

Where-to-Learn: Analytical Policy Gradient Directed Exploration for On-Policy Robotic Reinforcement Learning

Leixin Chang, Xinchen Yao, Ben Liu, Liangjing Yang, Hua Chen

足式 / 四足机器人导航 / SLAM / 自动驾驶机器人学习控制与动力学
摘要

On-policy reinforcement learning (RL) algorithms have demonstrated great potential in robotic control, where effective exploration is crucial for efficient and high-quality policy learning. However, how to encourage the agent to explore the better trajectories efficiently remains a challenge. Most existing methods incentivize exploration by maximizing the policy entropy or encouraging novel state visiting regardless of the potential state value. We propose a new form of directed exploration that uses analytical policy gradients from a differentiable dynamics model to inject task-aware, physics-guided guidance, thereby steering the agent towards high-reward regions for accelerated and more effective policy learning. We integrate our exploration approach into a widely used on-policy RL algorithm, Proximal Policy Optimization, to test and demonstrate its effectiveness. We conduct extensive benchmark experiments and demonstrate the effectiveness of the proposed exploration augmentation method. We further test our approach on a 6-DOF point-foot robot for velocity tracking locomotion, and conduct the simulation test and implement a successful sim-to-real deployment as the ultimate validation.

RA-L 2026-03-26

DSM: Constructing a Diverse Semantic Map for 3D Visual Grounding

Qinghongbing Xie, Zijian Liang, Fuhao Li, Long Zeng

操作与机械臂导航 / SLAM / 自动驾驶感知与传感
摘要

Effective scene representation is critical for the visual grounding ability of reasoning. However, existing 3D Visual Grounding methods either only focus on geometric and visual cues, or like traditional 3D scene graphs, lack the multi-dimensional attributes needed for complex reasoning. To bridge this gap, we introduce a novel scene representation framework, Diverse Semantic Map (DSM), that enriches robust geometric map with a spectrum of VLM-derived semantics, including appearance, physical, and affordance attributes. The DSM is first constructed online by fusing multi-view observations within a temporal sliding window, creating a comprehensive map of scene knowledge. Building on this foundation, we propose a new grounding paradigm, DSM-Grounding, that shifts grounding from free-form queries to a structured reasoning process over the semantic-rich map for VLM. Extensive evaluations validate our approach's superiority, improving accuracy and interpretability. DSM-Grounding achieves a state-of-the-art 59.06% overall accuracy of IoU@0.5, surpassing others by 10% on ScanRefer. In semantic segmentation, our DSM attains a 67.93% F-mIoU, outperforming all baselines in Replica. Furthermore, successful deployment on physical robots for complex navigation and grasping tasks confirms the framework's practical utility in real-world scenarios.

RA-L 2026-03-26

ThermoAct: Thermal-Aware Vision-Language-Action Models for Robotic Perception and Decision-Making

Young-Chae Son, Dae-Kwan Ko, Yoon-Ji Choi, Soo-Chul Lim

机器人学习感知与传感人机交互 / 遥操作
摘要

In recent human-robot collaboration environments, there is a growing focus on integrating diverse sensor data beyond visual information to enable safer and more intelligent task execution. Although thermal data can be crucial for enhancing robot safety and operational efficiency, its integration has been relatively overlooked in prior research. This paper proposes a novel Vision-Language-Action (VLA) framework that incorporates thermal information for robot task execution. The proposed system leverages a Vision-Language Model (VLM) as a high-level planner to interpret complex natural language commands and decompose them into simpler sub-tasks. This approach facilitates efficient data collection and robust reasoning for complex operations. Unlike conventional methods that rely solely on visual data, our approach integrates thermal information, enabling the robot to perceive physical properties and proactively ensure environmental safety. Experimental results from real-world task scenarios validate the feasibility of our proposed framework, suggesting its potential to enhance task success rates and safety compared to existing vision-based systems.

RA-L 2026-03-26

Safe Adaptive Control With Vanishing Conservativeness for Robotic Systems With Unknown Dynamics via Barrier Functions

Kasra Sinaei, Donald Ebeigbe

足式 / 四足机器人操作与机械臂导航 / SLAM / 自动驾驶控制与动力学
摘要

Adaptive controllers offer powerful tools for controlling nonlinear systems with parametric uncertainties, yet ensuring safety, particularly during the initial learning phase, remains a significant challenge. Conventional safety approaches often rely on filtering the control output using Control Barrier Functions (CBFs), which may fail to prevent unsafe behavior when parameter estimates are poor and can lead to reckless or overly conservative actions. This paper introduces a novel framework that directly integrates safety constraints into the parameter adaptation process itself. We propose enforcing safety by modulating the parameter update law through a minimally invasive perturbation, calculated via a real-time Quadratic Program (QP). This QP incorporates constraints derived from both a Robust Adaptive Control Lyapunov Function (RaCLF), ensuring system stability, and a CBF, guaranteeing forward invariance of a safe set. Critically, the CBF constraint leverages a formally derived, time-varying upper bound on the parameter estimation error, mitigating conservativeness as the system learns. We provide formal proofs for the stability of the closed-loop system and the validity of the safety constraint. The efficacy and practicality of the proposed safe parameter update strategy are demonstrated across diverse platforms, including numerical simulations of a mass-damper system and a mobile robot, high-fidelity simulation of a UR10 manipulator, and hardware experiments on a quadrupedal robot navigating amidst obstacles.

RA-L 2026-03-26

Advances in Hybrid Modular Climbing Robots: Design Principles and Refinement Strategies

Ryan Poon, Ian Hunter

足式 / 四足机器人操作与机械臂导航 / SLAM / 自动驾驶
摘要

This paper explores the design strategies for hybrid pole- or trunk-climbing robots, focusing on methods to inform design decisions and assess metrics such as adaptability and performance. A wheeled-grasping hybrid robot with modular, tendon-driven grasping arms and a wheeled drive system mounted on a turret was developed to climb columns of varying diameters. Here, the key innovation is the underactuated arms that can be adjusted to different column sizes by adding or removing modular linkages, though the robot also features capabilities like self-locking (the ability of the robot to stay on the column by friction without power), autonomous grasping, and rotation around the column axis. Mathematical models describe conditions for self-locking and vertical climbing. Experimental results demonstrate the robot's efficacy in climbing and self-locking, validating the proposed models and highlighting the potential for fully automated solutions in industrial applications. This work provides a comprehensive framework for evaluating and designing hybrid climbing robots, contributing to advancements in autonomous robotics for environments where climbing tall structures is critical.

RA-L 2026-03-26

Geometry-Aware 6-DoF Grasp Learning From Stacked Point Clouds in Unstructured Scenes

Xiaohan Li, Zhejian Zhang, Yadan Zeng, Shengjun Xu, I-Ming Chen

操作与机械臂导航 / SLAM / 自动驾驶感知与传感
摘要

To address the challenge of robust 6-DoF grasp prediction in cluttered, unstructured environments, this paper proposes a geometry-aware two-stage grasping framework. First, a plug-and-play Structural Edge-aware Keypoint Extractor is introduced to selectively retain salient edge geometry while mitigating point loss during downsampling. Second, a graspable feature extractor of GraspLAKAN-Net is developed to extract graspable features from stacked point clouds by integrating local aggregation with nonlinear mapping via an improved KAN-based architecture. To further enhance cross-object generalization, a coarse-to-fine Geometry-Aware Grasp Pose Estimation strategy is designed, enabling accurate and explainable 6-DoF pose prediction across both surface-dominant and edge-dominant objects. The proposed method is evaluated on a newly constructed MSDRG dataset comprising both simulated and real-world multi-object scenes. Comprehensive experimental results indicate the effectiveness of the proposed framework, which significantly outperforms state-of-the-art baselines in both recognition precision and operational efficiency.

RA-L 2026-03-26

Failure Detection With Zero-Shot Error Correction in Robotic Manipulation

Haojie Luo, Xizhou Bu, Hongbo Wang, Shijie Guo, Junxiu Liu, Wei Li

操作与机械臂机器人学习感知与传感控制与动力学
摘要

Diffusion Policy (DP) is effective for imitation learning in robotic manipulation, yet likelihood-based replanning methods lack self-correction and often fail once execution deviates. Vision-Language Models (VLMs) offer strong spatial reasoning for failure handling but incur prohibitive latency for real-time use. We propose a two-stage Failure Detection and Error Correction (FDEC) framework: a lightweight visual dynamics model performs fast predictive monitoring to flag potential failures, and a threshold switch triggers a VLM for high-precision verification and adaptive correction only when needed. On RLBench and physical platforms, FDEC improves detection speed by 2.1× over pure VLM-based detectors and raises task success by 26.7%, demonstrating a practical trade-off between responsiveness and generalization for real-world manipulation.

RA-L 2026-03-26

Self-Supervised Adaptive Transformer for Surgical Step Recognition in Robotic-Assisted Radical Prostatectomy

Yiru Ye, Wenlong Wang, Yonghao Long, Chi-Fai Ng, QingYin Zhou, Dongren Yang, et al.

操作与机械臂机器人学习感知与传感医疗 / 软体 / 微纳
摘要

The automatic recognition of surgical steps is essential for enhancing situational awareness and workflow automation in robotic-assisted surgery. However, existing vision-based approaches exhibit limitations in effectively leveraging rich spatial-temporal information from surgical videos, particularly when addressing generalization challenges across different clinical centers, surgeons, and patient populations. Current methods struggle with domain adaptation when deployed in diverse real-world settings due to substantial variations in surgical techniques, anatomical presentations, imaging conditions (such as lighting and white balance), and data preprocessing protocols. To address these limitations, we propose ProstaFormer (Prostatectomy Steps Transformer with Adaptive Feature Fusion), a framework that integrates a vision learner pre-trained via MAE with transformer-based temporal features through an adaptive fusion mechanism. Our approach intelligently combines spatial and temporal representations using position-aware attention weighting, enabling robust recognition of complex surgical workflow patterns across diverse clinical environments. Furthermore, we incorporate a diffusion-based Temporal Adaptation module to rectify domain-specific temporal order differences. We curate and annotate the comprehensive RPSteps dataset and conduct extensive experiments on both GraSP and RPSteps datasets. ProstaFormer consistently outperforms strong baselines, demonstrating improved generalization to different hospitals, surgeons, and patient populations, as well as superior robustness to image degradation. These results highlight the potential of adaptive feature fusion, temporal adaptation, and self-supervised visual pre-training for advancing intelligent robotic-assisted surgical workflow recognition. The source code and supplementary materials (including detailed pre-training setups and extended experimental results) are available at https://github.com/ProstaFormer/ProstaFormer.git .

RA-L 2026-03-26

Generalizable Hierarchical Skill Learning via Object-Centric Representation

Haibo Zhao, Yu Qi, Boce Hu, Yizhe Zhu, Ziyan Chen, Heng Tian, et al.

操作与机械臂机器人学习感知与传感
摘要

We present Generalizable Hierarchical Skill Learning (GSL), a novel framework for hierarchical policy learning that significantly improves policy generalization and sample efficiency in robot manipulation. One core idea of GSL is to use object-centric skills as an interface that bridges the high-level vision-language model and the low-level visual-motor policy. Specifically, GSL decomposes demonstrations into transferable and object-canonicalized skill primitives using foundation models, ensuring efficient low-level skill learning in the object frame. At test time, the skill-object pairs predicted by the high-level agent are fed to the low-level module, where the inferred canonical actions are mapped back to the world frame for execution. This structured yet flexible design leads to substantial improvements in sample efficiency and generalization of our method across unseen spatial arrangements, object appearances, and task compositions. In simulation, GSL trained with only 3 demonstrations per task outperforms baselines trained with 30 times more data by 15.5% on unseen tasks. In real-world experiments, GSL also surpasses the baseline trained with 10 times more data.

JFR 2026-04-28

Exploring the Use and Impact of Composite Materials in Robotics: A Systematic Review

Doglas Negri, Gabriela Wessling Oening, Felipe Augusto Carvalho de Faria, Sarah Maria Schroeder, Ricardo De Medeiros

医疗 / 软体 / 微纳
摘要

Lightweight and high‐strength materials are important in robotics, as structural design impacts efficiency, payload capacity, and energy consumption. Composite materials, with their superior stiffness‐to‐weight ratios and multifunctional properties, offer clear advantages over conventional metals and polymers. This review critically examines the use of composites in robotics, with a focus on their structural, active, and sensory functions. A systematic literature review based on an adapted PRISMA framework identified over 100 publications from 1988 to 2026, revealing continuous research growth and rising interest in soft robotics, piezoelectric composites, and carbon‐based structures. In contrast, bioinspired systems have declined due to integration and manufacturing challenges. Despite the progress, major barriers remain, particularly scalability, long‐term reliability, and cost. This work proposes a functional taxonomy of composites in robotics and outlines future directions, including sustainable bio‐based materials, multimaterial additive manufacturing, and 4D‐printed adaptive systems. By integrating materials science and robotics, this review provides a concise roadmap, in composite structures, for developing high‐performance, multifunctional, and sustainable robotic technologies.

RA-L 2026-04-06

Model-Predictive Vibration Reduction of a Suspended Cable-Driven Parallel Robot

Sophie Rousseau, Nicolò Pedemonte, François Chaumette, Stéphane Caro

控制与动力学
摘要

This letter focuses on the vibration reduction for Cable-Driven Parallel Robots. A Non-linear Model Predictive Controller (NMPC) is designed and implemented to adapt the control inputs sent to the robot actuators. This NMPC is based on an elasto-dynamic model of the robot, which is discretized and included in a constrained optimal control problem. Two experiments are conducted and the performance of the NMPC is compared with a Computed Torque Controller and a NMPC with an inelastic model. The first one is a trajectory following task and shows that the proposed controller significantly reduces the amplitude of the vibrations of the Moving-Platform along the trajectory and the second one focuses on a sudden braking situation and shows that the proposed control scheme manages to divide the amplitude of the post-braking oscillations by factors ranging from one and a half to twenty-two, demonstrating the interest of this controller for perturbation rejection.

RA-L 2026-04-06

Online Trajectory Optimization for Arbitrary-Shaped Mobile Robots via Polynomial Separating Hypersurfaces

Shuoye Li, Zhiyuan Song, Yulin Li, Zhihai Bi, Jun Ma

导航 / SLAM / 自动驾驶
摘要

An emerging class of trajectory optimization methods enforces collision avoidance by jointly optimizing the robot's configuration and a separating hyperplane. However, as linear separators only apply to convex sets, these methods require convex approximations of both the robot and obstacles, which becomes an overly conservative assumption in cluttered and narrow environments. In this work, we unequivocally remove this limitation by introducing nonlinear separating hypersurfaces parameterized by polynomial functions. We first generalize the classical separating hyperplane theorem and prove that any two disjoint bounded closed sets in Euclidean space can be separated by a polynomial hypersurface, serving as the theoretical foundation for nonlinear separation of arbitrary geometries. Building on this result, we formulate a nonlinear programming (NLP) problem that jointly optimizes the robot's trajectory and the coefficients of the separating polynomials, enabling geometry-aware collision avoidance without conservative convex simplifications. The optimization remains efficiently solvable using standard NLP solvers. Simulation and real-world experiments with nonconvex robots demonstrate that our method achieves smooth, collision-free, and agile maneuvers in environments where convex-approximation baselines fail.

RA-L 2026-04-06

A Kind of Flexible Forceps With High Clamping Force and Large Opening Angle for Endoscopic Surgery

Yuan Xing, Fengchuan Liang, Mengjie Chen, Kang Kong, Tao Liang

医疗 / 软体 / 微纳
摘要

In endoscopic submucosal dissection, endoscopic forceps is subject to stringent requirements regarding flexibility, miniaturization, and mechanical performance. However, most existing endoscopic forceps suffer from insufficient clamping force and limited opening angle. In this letter, we analyze the influence of the jaw configuration on the clamping force and opening angle of flexible forceps, and develop an innovative composite-groove jaw configuration that achieves a high clamping force of 2.57 N and a large opening angle of 105°, while maintaining a compact outer diameter of 2.8 mm. Additionally, the forceps provides 360° independent distal rotation, allowing flexible orientation under bent configurations. The kinematic model of the forceps, established based on screw theory, defines a workspace that approximates a hemisphere with a radius of 15 mm. Its modular design facilitates rapid instrument replacement during surgery. The ex vivo experiment on a fresh porcine stomach confirms its ability to realize effective tissue traction and surgical field exposure during endoscopic submucosal dissection, demonstrating its potential to improve efficiency and safety in endoscopic surgery.

RA-L 2026-03-31

VistaDepth: Improving Far-Range Depth Estimation With Spectral Modulation and Adaptive Reweighting

Mingxia Zhan, Li Zhang, Yingjie Wang, Xiaomeng Chu, Beibei Wang, Yanyong Zhang

机器人学习感知与传感
摘要

Monocular depth estimation infers per-pixel depth from a single RGB image. It remains particularly challenging in far-range regions, where sparse observations and long-tailed depth distributions bias learning toward near-range content. Diffusion models offer a promising alternative to discriminative foundation models by leveraging rich generative priors for zero-shot generalization. However, existing methods still struggle to recover fine distant structures under weak supervision and limited image evidence. To address this, we propose VistaDepth, a diffusion framework for far-range depth estimation. It improves far-range prediction through two complementary mechanisms: Latent Frequency Modulation (LFM), which refines latent features via dynamic, content-aware spectral filtering, and BiasMap, which adaptively reweights the diffusion objective toward distant, structurally informative regions while remaining aligned with denoising. Extensive experiments on five benchmarks show that VistaDepth achieves state-of-the-art performance among diffusion-based methods. Despite training on orders of magnitude less data than discriminative baselines, it remains competitive globally and excels in challenging far-range regions.

RA-L 2026-03-25

On the Importance of Tactile Sensing for Imitation Learning: A Case Study on Robotic Match Lighting

Niklas Funk, Changqi Chen, Tim Schneider, Georgia Chalvatzaki, Roberto Calandra, Jan Peters

操作与机械臂机器人学习感知与传感
摘要

The field of robotic manipulation has advanced significantly in recent years. At the sensing level, several novel tactile sensors have been developed, capable of providing accurate contact information. On a methodological level, learning from demonstrations has proven an efficient paradigm to obtain performant robotic manipulation policies. The combination of both holds the promise to extract crucial contact-related information from the demonstration data and actively exploit it during policy rollouts. However, this integration has so far been underexplored, most notably in dynamic, contact-rich manipulation tasks where precision and reactivity are essential. This work therefore proposes a multimodal, visuotactile imitation learning framework that integrates a modular transformer architecture with a flow-based generative model, enabling efficient learning of fast and dexterous manipulation policies. We evaluate our framework on the dynamic, contact-rich task of robotic match lighting - a task in which tactile feedback influences human manipulation performance. The experimental results highlight the effectiveness of our approach and show that adding tactile information improves policy performance, thereby underlining their combined potential for learning dynamic manipulation from few demonstrations.

RA-L 2026-03-25

A Semantic-Aware Framework for Safe and Intent-Integrative Assistance in Upper-Limb Exoskeletons

Yu Chen, Shu Miao, Chunyu Wu, Jingsong Mu, Bo Ouyang, Xiang Li

操作与机械臂机器人学习医疗 / 软体 / 微纳人机交互 / 遥操作控制与动力学
摘要

Upper-limb exoskeletons are primarily designed to provide assistive support by accurately interpreting and responding to human intentions. In home-care scenarios, exoskeletons are expected to adapt their assistive configurations based on the semantic information of the task, adjusting appropriately in accordance with the nature of the object being manipulated. However, existing solutions often lack the ability to understand task semantics or collaboratively plan actions with the user, limiting their generalizability. To address this challenge, this paper introduces a semantic-aware framework that integrates large language models into the task planning framework, enabling the delivery of safe and intent-integrative assistance. The proposed approach begins with the exoskeleton operating in transparent mode to capture the wearer's intent during object grasping. Once semantic information is extracted from the task description, the system automatically configures appropriate assistive parameters. In addition, a diffusion-based anomaly detector is used to continuously monitor the state of human-robot interaction and trigger real-time replanning in response to detected anomalies. During task execution, online trajectory refinement and impedance control are used to ensure safety and regulate human-robot interaction. Experimental results demonstrate that the proposed method effectively aligns with the wearer's cognition, adapts to semantically varying tasks, and responds reliably to anomalies.

RA-L 2026-03-25

Dynamics-Based Visual Tracking of Robot Pose With Camera Extrinsic Online Calibration

Jingxuan Zhang, Kai Nie, Beixian Lai, Zhiwen Li, Yongping Pan

操作与机械臂感知与传感控制与动力学
摘要

Visual servoing provides the flexibility of robot control in dynamic environments, but three-dimensional (3-D) robot visual servoing is challenging due to the 2-D nature of the image space. Adaptive homography-based visual servoing (HBVS) is effective for 3-D robot pose control under various parametric uncertainties, but the robot dynamics is seldom considered under uncalibrated camera parameters. This paper proposes a dynamics-based adaptive HBVS method for 3-D pose tracking of robot manipulators using an eye-to-hand monocular camera with unknown extrinsic parameters, where a composite learning law is designed to achieve parameter convergence without the stringent persistent excitation condition. Compared with existing adaptive HBVS methods, the proposed method accounts for robot dynamics in the control design to enhance control performance, calibrates extrinsic parameters online under a weak condition of interval excitation, and does not require measuring the rotation of a reference plane or finding valid vanishing points. Experiments on an articulated robot with 7 degrees of freedom, named Franka Emika Panda, have verified that the proposed method performs well for both camera extrinsic online calibration and robot 3-D pose tracking.

RA-L 2026-03-25

Grasp to Act: Dexterous Grasping for Tool Use in Dynamic Settings

Harsh Gupta, Mohammad Amin Mirzaee, Wenzhen Yuan

操作与机械臂机器人学习控制与动力学
摘要

Achieving robust grasping with dexterous hands remains challenging, especially when manipulation involves dynamic forces such as impacts, torques, and continuous resistance-situations common in real-world tool use. Existing methods largely optimize grasps for static geometric stability and often fail once external forces arise during manipulation. We present Grasp-to-Act , a hybrid system that combines physics-based grasp optimization with reinforcement-learning-based grasp adaptation to maintain stable grasps throughout functional manipulation tasks. Our method synthesizes robust grasp configurations informed by human demonstrations and employs an adaptive controller that residually issues joint corrections to prevent in-hand slip while tracking the object trajectory. Grasp-to-Act enables robust zero-shot sim-to-real transfer across five dynamic tool-use tasks-hammering, sawing, cutting, stirring, and scooping-consistently outperforming baselines. Across simulation and real-world hardware trials with a 16-DoF dexterous hand, our method reduces translational and rotational in-hand slip and achieves the highest task completion rates, demonstrating stable functional grasps under dynamic, contact-rich conditions.

RA-L 2026-03-25

Learning Robot Visual Navigation in Crowds via Intention-Aware Scene Representations

Han Bao, Bingyi Xia, Hanjing Ye, Yu Zhan, Hao Cheng, Baozhi Jia, et al.

导航 / SLAM / 自动驾驶机器人学习感知与传感
摘要

Robot crowd navigation requires the ability to infer human intentions while accounting for the structural constraints of the environment. Currently, deep reinforcement learning (DRL) provides a promising method for learning navigation policies that understand human intentions. However, most of them rely on limited scene representations, treating pedestrians as simple 2D points and ignoring rich visual cues from both humans and the environment. To address this issue, iCrowdNav, a novel visual crowd navigation method with intention-aware scene representations, is introduced to encode behavioral and structural context from egocentric visual observations. Our method employs two key components: a spatio-temporal encoder for extracting occupancy features of the scene, and Intent-Interact Former (I2Former), an attention-based module that encodes human poses to infer pedestrians' motion intentions. These features are integrated into a compact state embedding that supports effective DRL policy training. Extensive experiments show that our method achieves superior performance over baselines, and real world deployment demonstrates vision-based crowd navigation.

RA-L 2026-03-25

A Continuum Robot Based on a Conical Helical Configuration for Adaptive Grasping of Irregular Objects

LinFeng Li, Shouzhong Li, Maorong Wang, Yitao Chen, Xinyu Kuang, Junli Li, et al.

操作与机械臂导航 / SLAM / 自动驾驶医疗 / 软体 / 微纳
摘要

This letter presents a novel continuum robot designed to overcome the limitations of existing winding-based grasping methods. The robot is based on a conical spiral curve design and consists of multiple 2-degree-of-freedom flexible joint, enabling lateral helical gripping of irregular objects. A structural parameter model of the robot is established to clarify the mapping between the geometric characteristics of the conical helical curve and the physical structural parameters of the robot. The kinematic model is established using the D–H method, and the static model is derived based on the Lagrange equation. Finite element simulations are conducted to establish the relationship between tendon driving force and tendon length variation. A physical prototype is fabricated and the performance of the robot is systematically validated through experiments. The results demon strate that the robot possesses stable posture control and adaptive grasping capability, enabling effective grasping of irregular objects with diameters ranging from 55 mm to 100 mm. The total mass of the robot is 190 g, and the maximum lateral grasping load reaches 459 g. Overall, this study provides an effective and innovative solution for addressing the winding grasping of irregular objects.

JFR 2026-04-27

Developing an Experimental In Situ Floating Buoy to Investigate the Impacts of Future Floating Wind Farms

Lou Gaillard, Pierre Lefèvre, Anne Tessier, Hervé Glotin, Lisa Ferré, Serge Planes

控制与动力学
摘要

Offshore wind farms are expanding rapidly in response to climate change. Along the French Mediterranean coastline, floating wind farms are favored, yet their environmental and socio‐economic impacts remain poorly understood. The EolBio project aims to provide insights by developing and deploying an instrumented buoy to evaluate the potential effect of offshore wind farms on the marine local environment. Located in the Northwestern Gulf of Lion (Mediterranean Sea), the buoy operated over a 3‐year period in the vicinity of the future floating wind farm, Eolmed. It integrates a suite of cutting‐edge sensors to continuously record physical, chemical, and biological parameters, enabling real‐time and long‐term environmental characterization. This device represents a novel robotic approach to ecosystem monitoring in offshore environments. The in situ collected data, in addition to characterizing the modifications generated by the introduction of a floating structure, serve as inputs for trophic ecosystem models to simulate the spatiotemporal dynamics of marine food webs under different wind farm scenarios. This modelling work aims to anticipate the ecological and socio‐economic consequences of floating offshore wind farms and help sustainable deployment strategies. Here, we present the design, deployment, and operational process of the EolBio buoy system, along with a methodological workflow combining laboratory analyses, artificial intelligence, and ecosystem modelling. This approach provides a unique framework for assessing the early impacts of offshore infrastructures on marine ecosystems and illustrates how instrumented buoys can play a central role in large‐scale environmental monitoring.

JFR 2026-04-21

Biomimetic Multifinger Tactile Sensing and Contact‐Regulated Palpation for Autonomous Breast Tumor Localization

Kai Cheng, Yuanyuan Shen, Huanhai Zhang, Baifeng Li, Chao Cheng, Hamid Reza Karimi, et al.

导航 / SLAM / 自动驾驶感知与传感
摘要

Early detection of breast abnormalities remains challenging: manual palpation is subjective and operator‐dependent, while imaging modalities may miss small or subtle stiffness anomalies. This paper presents a biomimetic multifinger robotic palpation approach intended to support early breast‐cancer screening and follow‐up assessment as a proof‐of‐concept. The system integrates tactile arrays with a contact‐regulation scheme under standardized protocols. Each fingertip produces spatial tactile heatmaps, and a human‐like multifinger pressing strategy is used to elicit finger‐wise normal‐interaction responses under controlled contact conditions. The feedback variable is a resultant tactile signal obtained by aggregating taxel readings and is treated as a proxy of normal interaction rather than an absolute force measurement. A real‐time Kalman filter is employed to improve signal fidelity during dynamic contact. The platform is validated on breast‐inspired silicone phantoms with embedded rigid inclusions at varying depths and orientations. Across the tested scenarios, the system achieves repeatable real‐time localization, with stiffness‐weighted centroid errors within 10 mm of the nominal inclusion coordinates and a low incidence of spurious detections under the standardized protocol. We clarify that this study is a proof‐of‐concept focusing on stiffness anomaly localization under controlled phantom conditions rather than clinical diagnosis or benign/malignant classification.

RA-L 2026-03-30

MBD-Planner: A Real-Time Obstacle Avoidance Framework for UAVs via Feature-Domain Motion Blur Decoupling

Sitian Peng, Rui Wang, Yu Liang, Longwei Wang

无人机 / 空中机器人导航 / SLAM / 自动驾驶
摘要

To address UAV obstacle avoidance under motion blur, we propose Feature-Domain Motion Blur Decoupling Planner (MBD-Planner), a real-time feature-domain motion blur decoupling framework that jointly disentangles blurred visual features and optimizes trajectories in an end-to-end manner. Unlike methods relying on explicit 3D reconstruction or ignoring blur in planning, MBD-Planner introduces a lightweight dewarping module that transforms blurred features into virtual sharp ones via depth back-projection and pose compensation, achieving sub-millisecond latency. A blur-aware optimization objective further couples camera motion and trajectory smoothing, while privileged training guided by ESDF gradients enables robust learning without expert demonstrations. Experimental results demonstrate that MBD-Planner enables reliable, smooth, and real-time navigation in visually degraded environments, effectively bridging the gap between theoretical planning and robust navigation.

RA-L 2026-03-30

Adaptive Collision Detection via Impulse-Momentum Theorem for Safe Sensorless Physical Human-Robot Interaction

Hongzhe Shi, Chao Ye, Chenlu Liu, Weiyang Lin

感知与传感人机交互 / 遥操作
摘要

Collision detection is critical for safe physical human-robot interaction (pHRI). The generalized momentum observer (GMO) is a prevalent sensorless method, estimating momentum deviations to detect collisions. However, its reliance on iterative solutions and static thresholds severely limits operational efficiency and dynamic adaptability, leading to reduced sensitivity and accuracy in complex collision scenarios. To overcome these limitations, this paper proposes an impulse-based dynamic threshold generalized momentum observer (IDT-GMO)for adaptive sensorless collision detection. Key innovations include: (i) An impulse-based dynamic threshold mechanism, leveraging the impulse-momentum theorem, enabling adaptive collision detection with enhanced sensitivity and accuracy; (ii) A closed-form analytical solution, formulated through Lagrangian mechanics and Lie group theory, eliminating iterative computation and achieving 45% faster processing than conventional GMO. Experimental validation across soft contact, rigid impact, and multi-contact scenarios confirms that IDT-GMO achieves superior detection sensitivity and accuracy compared to existing methods. Thus, IDT-GMO exhibits significant potential for applications ranging from delicate contact tasks to collision-prone environments.

RA-L 2026-04-09

An Efficient Closed-Form Solution to Full Visual-Inertial State Initialization

Samuel Cerezo, Seong Hun Lee, Javier Civera

摘要

In this letter, we present a closed-form initialization method that recovers the full visual–inertial state without nonlinear optimization. Unlike previous approaches that rely on iterative solvers, our formulation yields analytical, easy-to-implement, and numerically stable solutions for reliable start-up. Our method builds on small-rotation and constant-velocity approximations, which keep the formulation compact while preserving the essential coupling between motion and inertial measurements. We further propose an observability-driven, two-stage initialization scheme that balances accuracy with initialization latency. Extensive experiments on the EuRoC dataset validate our assumptions: our method achieves 10−20% lower initialization error than optimization-based approaches, while using 4× shorter initialization windows and reducing computational cost by 5×.

RA-L 2026-04-09

State and Contact Force Estimation for Self-Balancing Electric Bicycles

Qufei Zhang, Hua Huang, Jiaming Xiong, Caishan Liu

摘要

This letter proposes a reliable estimation algorithm to estimate both the state and the wheel-ground contact interaction, which is essential for the stable control of the electric bicycle. The whole electric bicycle is modeled as a rigid four-body structure, and the Lagrangian equations of the first kind are derived. With the combination of the Inertial Measurement Unit (IMU) and the Global Positioning System (GPS) measurements, the estimation algorithm is proposed, which extends the Extended Kalman Filter (EKF) to the hybrid state space composed of the SE(3) group and $\mathbb {R}^{3}$ domain. The Coulomb contact model is simultaneously integrated to estimate the contact forces. The algorithm can accurately capture changes in wheel-ground contact and, with weak reliance on sensors, detect slip and lift-off, enabling effective estimation of the state and contact forces, which are then employed in the closed-loop control framework to achieve trajectory tracking while maintaining self-balancing. Simulations and experiments are conducted to demonstrate the accuracy, efficiency, effectiveness, and robustness of the estimation algorithm, as well as the practical performance of the estimation-based control framework.

RA-L 2026-03-27

Continual Reinforcement Learning Framework for Scalable Collision Avoidance and Mitigation System With Packing Strategy

Joonhee Lim, Jangho Shin, Dongsuk Kum

导航 / SLAM / 自动驾驶机器人学习
摘要

Collision Avoidance and Mitigation System (CAMS) in autonomous driving systems is crucial for ensuring safety by formulating strategies to address various collisions and planning the trajectory accordingly. Although recent learning-based motion planning methods for CAMS have shown promising results for specific collision scenarios, the question of how to continually scale up their knowledge across different driving environments has not yet been thoroughly investigated. In this paper, we propose a scalable learning framework for CAMS that combines deep reinforcement learning and continual learning. By sequentially packing knowledge of new scenarios into a model trained on specific scenarios, the proposed method continually scales up the range of scenarios the model can handle while maintaining performance on previously trained scenarios. We apply this learning framework to a motion planner for CAMS and evaluate its scalability across new driving environments. This approach demonstrates a significant ability to pack knowledge by outperforming a model that jointly trains on all scenarios—typically considered an upper bound for continual learning—in about 2.4× fewer training steps, with a 4.19% higher collision avoidance rate and a 12.02% lower collision impact. Moreover, our approach achieves performance close to models trained on each scenario individually, with a model about 5.8× smaller.

RA-L 2026-03-27

Benchmarking Full-Stack ROS 2 Simulation Platforms for Mobile Robots

Ángel Soriano, Román Navarro, Juan Manuel Navarrete, Robert Vasquez, Rafael Martín, Carlos Gascón, et al.

导航 / SLAM / 自动驾驶感知与传感
摘要

Following the official end-of-life of ROS 1 in May 2025, the robotics community is rapidly migrating to ROS 2, necessitating a rigorous re-evaluation of available simulation tools. This paper presents a comparative benchmarking of five widely used simulators: Gazebo (Harmonic), Webots (R2025a), Unity (6.1), NVIDIA Isaac Sim (4.5.0), and Open 3D Engine (O3DE). Using a standardized high-performance workstation and a unified mobile robot model equipped with stereo cameras, 3D LiDAR, and IMU, each platform is evaluated across distinct environments. Quantitative metrics reported include Startup Time, Real-Time Factor (RTF), and computational footprint (CPU, RAM, and GPU usage). Results reveal a clear trade-off between efficiency and visual fidelity: “ROS 2-first” engines like Webots and Gazebo demonstrate competitive startup times and low resource consumption, making them suitable for iterative development and CI/CD pipelines. Conversely, “Graphics-first” engines like Isaac Sim and Unity offer photorealistic rendering suitable for perception training, but at a significantly higher computational cost. Additionally, O3DE is analyzed as a promising native-ROS 2 alternative. To facilitate independent verification and future research, the complete benchmarking suite, including scenes and measurement scripts, is released as an open-source repository.

RA-L 2026-03-27

Lift-Augmentation Aware Backstepping Control for an Underactuated Bird-Scale Flapping Wing Vehicle

Shijun Zhou, Aidan Orr, Nak-Seung P. Hyun

无人机 / 空中机器人控制与动力学
摘要

In this paper, we aim to design a nonlinear controller to achieve altitude tracking and pitch stabilization for a class of underactuated nonlinear systems, namely, bird-scale flapping-wing vehicles (BFWVs) with a single actuator. The major challenge lies in the underactuated nonlinear system dynamics and the dependency of lift generation on forward translational velocity. In this paper, a lift-augmentation-aware reduced-order model for a class of BFWV incapable of hovering is introduced with an emphasis on the sagittal plane (body $YZ$ plane) dynamics. A feasible fixed point for altitude control of the underactuated system is studied, and a Lyapunov-based switching controller is developed to force the states to stabilize at the desired altitude. We demonstrate its effectiveness in both numerical simulations and closed-loop indoor experiments and address the challenges of controlling the underactuated BFWV.

RA-L 2026-03-27

GVI-Switch: A High-Precision GNSS-Visual-Inertial State Estimator for Indoor-Outdoor Environments

Tong Zhang, Kezhen Zhao, Bize Zhou, Heng Gao

感知与传感控制与动力学
摘要

We propose a high-precision GNSS-visual-inertial state estimator named GVI-Switch, augmented with a loosely coupled Error State Kalman Filter (ESKF), to achieve stable six-degree-of-freedom (6-DoF) pose estimation under varying GNSS signal conditions. First, we developed a GNSS degradation detection scheme based on commonly observed GNSS degradation phenomena. This scheme effectively identifies and eliminates abnormal GNSS signals. During the joint initialization phase, we account for the noise amplification in GNSS signals caused by the carrier's passive movement and propose a method to detect and exclude signals that are severely affected by noise. In the loosely coupled ESKF process, to handle GNSS signal fluctuations during transitions between indoor and outdoor environments, we designed a feedback buffering mechanism that constrains the error state vector. This mechanism not only mitigates the accumulation of errors in a degenerate system but also ensures the stability of positioning results during GNSS status changes. Experimental results demonstrate that our algorithm achieves high-precision, stable, and continuous global positioning outdoors, with an Absolute Trajectory Error (ATE) reduction of up to 70% compared to state-of-the-art (SoTA) GNSS fusion algorithms.

AuRo 2026-04-17

Online learning for vibration suppression in physical robot interaction using power tools

Gokhan Solak, Arash Ajoudani

人机交互 / 遥操作控制与动力学
摘要

Vibration suppression is an important capability for collaborative robots deployed in challenging environments such as construction sites. We study the active suppression of vibration caused by external sources such as power tools. We adopt the band-limited multiple Fourier linear combiner (BMFLC) algorithm to learn the vibration online and counter it by feedforward force control. We propose the damped BMFLC method, extending BMFLC with a novel adaptive step-size approach that improves the convergence time and noise resistance. Our logistic function-based damping mechanism decreases the effective step-size for weaker frequency components while allowing larger step-sizes for relevant components, resulting in improved vibration suppression performance. We evaluate our method on extensive simulation experiments with realistic time-varying multi-frequency vibration and real-world physical interaction experiments. The simulation experiments show that our method improves the suppression rate in comparison to the original BMFLC and its recursive least squares and Kalman filter-based extensions. Furthermore, our method is far more efficient than the latter two. We further validate the effectiveness of our method in real-world polishing experiments.

RA-L 2026-03-26

Class-Distribution Guided Active Learning for 3D Occupancy Prediction in Autonomous Driving

Wonjune Kim, In-Jae Lee, Sihwan Hwang, Sanmin Kim, Dongsuk Kum

导航 / SLAM / 自动驾驶感知与传感
摘要

3D occupancy prediction provides dense spatial understanding critical for safe autonomous driving. However, this task suffers from a severe class imbalance due to its volumetric representation, where safety-critical objects (bicycles, traffic cones, pedestrians) occupy minimal voxels compared to dominant backgrounds. Additionally, voxel-level annotation is costly, yet dedicating effort to dominant classes is inefficient. To address these challenges, we propose a class-distribution guided active learning framework for selecting training samples to annotate in autonomous driving datasets. Our approach combines three complementary criteria to select the training samples. Inter-sample diversity prioritizes samples whose predicted class distributions differ from those of the labeled set, intra-set diversity prevents redundant sampling within each acquisition cycle, and frequency-weighted uncertainty emphasizes rare classes by reweighting voxel-level entropy with inverse per-sample class proportions. We ensure evaluation validity by using a geographically disjoint train/validation split of Occ3D-nuScenes, which reduces train-validation overlap and mitigates potential map memorization. With only 42.4% labeled data, our framework reaches 26.62 mIoU, comparable to full supervision and outperforming active learning baselines at the same budget. We further validate generality on SemanticKITTI using a different architecture, demonstrating consistent effectiveness across datasets.

RA-L 2026-03-26

FlexiCup: Wireless Multimodal Suction Cup With Dual-Zone Vision-Tactile Sensing

Junhao Gong, Shoujie Li, Kit-Wa Sou, Changqing Guo, Hourong Huang, Tong Wu, et al.

操作与机械臂感知与传感
摘要

Conventional suction cups lack sensing capabilities for contact-aware manipulation in unstructured environments. This paper presents FlexiCup, a multimodal suction cup with wireless electronics that integrate dual-zone vision-tactile sensing. The central zone dynamically switches between vision and tactile modalities via illumination control, while the peripheral zone provides continuous spatial awareness. The modular mechanical design supports both vacuum (sustained-contact adhesion) and Bernoulli (contactless lifting) actuation while maintaining the identical dual-zone sensing architecture, demonstrating sensing-actuation decoupling where sensing and actuation principles are orthogonally separable. We validate hardware versatility through dual control paradigms. Modular perception-driven grasping achieves comparable success rates across vacuum (90.0%) and Bernoulli (86.7%) modes using identical sensing and control pipelines, validating the sensing architecture's effectiveness across fundamentally different pneumatic principles. Diffusion-based end-to-end learning achieves 73.3% and 66.7% success on contact-aware manipulation tasks, with ablation studies confirming 13% improvements from multi-head attention coordinating dual-zone observations. Hardware designs, firmware, and experimental videos are available at the companion website: https://flexicup.junhaogong.top .

RA-L 2026-03-26

Backstepping Sliding Mode Control With Enhanced Error-Dependent Observer Switching Mechanism for Knee Exoskeleton

Zhipeng Wang, Xiaofei Hu, Chunzhi Yi, Chifu Yang, Zhen Ding, Litong Lyu, et al.

医疗 / 软体 / 微纳控制与动力学
摘要

Exoskeletons are currently widely used for athletic enhancement and rehabilitation training. However, the force loading accuracy of wearable flexible knee exoskeletons is limited by system characteristics and the human-machine interaction. In this paper, an effective force loading control algorithm is proposed. First, a detailed human-machine interaction system model of the exoskeleton is established. Second, an enhanced error-dependent observer switching mechanism is proposed for the exoskeleton force loading system. The stability of a nonlinear extended state observer (NESO) for a third-order force loading system is rigorously proved using the Lyapunov theorem, thereby addressing the difficulty of analyzing the stability of the switched extended state observer (SESO). Based on the SESO, a backstepping sliding mode controller (BSSM) is designed to ensure that the dynamic performance of the force loading is improved with a small switching gain. Finally, comparative experiments verify the effectiveness of the proposed control method, which can effectively suppress parameter perturbations and external disturbances, thereby achieving high-precision tracking.

RA-L 2026-03-26

Language-Conditioned Representations and Mixture-of-Experts Policy for Robust Multi-Task Robotic Manipulation

Xiucheng Zhang, Yang Jiang, Hongwei Qin, Jiashuo Bai, Maocheng Bai

操作与机械臂机器人学习
摘要

Perceptual ambiguity and task conflict limit multi-task robotic manipulation via imitation learning. We propose a framework combining a Language-Conditioned Visual Representation (LCVR) module and a Language-conditioned Mixture-of-Experts Density Policy (LMoE-DP). LCVR resolves perceptual ambiguity by grounding visual features with language instructions, enabling differentiation between visually similar tasks. To mitigate task conflict, LMoE-DP uses a sparse expert architecture to specialize in distinct, multimodal action distributions, stabilized by gradient modulation. Experiments in simulation and on a real robot show consistent improvements over strong multi-task baselines. LMoE-DP achieves 73.4% success on LIBERO and 79% on real-robot tasks. Ablations isolate the contributions of LCVR, sparse specialization, and gradient modulation. Together, these components enable efficient and robust multi-task manipulation.

RA-L 2026-03-26

Parallel Elastic Ankle Actuator With a Unidirectional Clutch for Explosive Motion in Legged Robots

Mingyuan Dou, Ning He, Lile He, Jiaxuan Chen

足式 / 四足机器人控制与动力学
摘要

Integrating parallel elastic elements into legged robots can reduce actuator torque demand and improve energy efficiency. However, elastic energy release at the joint level is often tightly coupled with actuator motion, which, in high-power tasks such as jumping, may cause undesired energy exchange, increased energy consumption, and reduced joint mobility. This paper proposes a Unidirectional Bearing–Decoupling Parallel Elastic Ankle Actuator (UBD-PEAA). The proposed actuator selectively engages the motor and elastic element through a clutch mechanism, enabling passive decoupling that allows rapid elastic energy release without being directly constrained by the actuator's angular velocity limit. Inspired by the energy storage and release function of the human Achilles tendon during dynamic locomotion, the design employs a high-reduction-ratio motor and realizes clutch switching through angular coupling between the ankle joint and a unidirectional bearing. During the stance phase, the clutch remains engaged, enabling high-torque spring preloading under a high reduction ratio with low actuator power consumption. During take-off, the clutch passively disengages, allowing rapid elastic energy release. Single-leg simulations and prototype experiments confirm that the proposed design decouples the conflicting high-torque and high-angular -velocity requirements of ankle actuation in jumping. The ankle actuator weighs only 17% of the combined mass of the hip and knee actuators, while achieving a peak ankle torque of 109% of the hip torque during jumping. Compared with a conventional two-degree-of-freedom leg, the proposed design improves energy efficiency by 19.6% in horizontal jumping tasks and achieves a maximum jump distance of 0.70 m, demonstrating efficient push-off dynamics under high-power conditions.

JFR 2026-04-16

The Shifting Paradigms of Disaster Robotics Three Decades of Research

Qin Hu, Komagata Tomoko, Kanbara Sakiko, Ting Yu, Yan Jiang

机器人学习感知与传感
摘要

The increasing frequency and magnitude of natural disasters worldwide have raised the demand for advanced robotic systems in disaster response. In this research, a comprehensive bibliometric analysis of 2,201 Web of Science database papers was investigate systematically global research trends, patterns and emerging frontiers in disaster robotics. CiteSpace, the Bibliometrix R package, and VOSviewer were used for data analysis. The results reveal a fluctuating yet consistently rising publication trends from 1992 to 2025. Asia leads global research output, with Japan as the most productive country but with limited international collaboration, while European countries engage in more extensive international cooperation. The Journal of Field Robotics is identified as the most influential publication in this field. Over time, research on disaster robots has gradually shifted from hardware development to intelligence technologies, such as deep learning and perception‐enabled robotic systems. The study provides valuable insights to guide future research directions and strengthen global collaboration in disaster robotics.

JFR 2026-04-10

Foundation Model‐Driven Grasping of Unknown Objects via Center of Gravity Estimation

Kang Xiangli, Yage He, Xianwu Gong, Zehan Liu, Yuru Bai

操作与机械臂导航 / SLAM / 自动驾驶机器人学习感知与传感控制与动力学
摘要

This study presents a grasping method for objects with uneven mass distribution by leveraging diffusion models to localize the center of gravity (CoG) on unknown objects. In robotic grasping, CoG deviation often leads to postural instability, where existing keypoint‐based or affordance‐driven methods exhibit limitations. We constructed a dataset of 790 images featuring unevenly distributed objects with keypoint annotations for CoG localization. A vision‐driven framework based on foundation models was developed to achieve CoG‐aware grasping. Experimental evaluations across real‐world scenarios demonstrate that our method achieves a 49% higher success rate compared to conventional keypoint‐based approaches and an 11% improvement over state‐of‐the‐art affordance‐driven methods. The system exhibits strong generalization with a 76% CoG localization accuracy on unseen objects. This provides an innovative solution for precise and stable grasping tasks, with its scientific validity further validated in complex and dynamic scenarios.

RA-L 2026-04-06

MIGHTY: Hermite Spline-Based Efficient Trajectory Planning

Kota Kondo, Yuwei Wu, Vijay Kumar, Jonathan P. How

摘要

Hard-constraint trajectory planners often rely on commercial solvers and demand substantial computational resources. Existing soft-constraint methods achieve faster computation, but either (1) decouple spatial and temporal optimization or (2) restrict the search space. To overcome these limitations, we introduce MIGHTY, a Hermite spline-based planner that performs spatiotemporal optimization while fully leveraging the continuous search space of a spline. In simulation, MIGHTY achieves a 9.3% reduction in computation time and a 13.1% reduction in travel time over state-of-the-art baselines, with a 100% success rate. In hardware, MIGHTY completes multiple high-speed flights up to 6.7 m/s in a cluttered static environment and long-duration flights with dynamically added obstacles.

RA-L 2026-04-06

Design and Control of a Parallel Elastic Actuator With Adjustable Equilibrium Position

Yixi Chen, Evangelos Chatziandreou, Chase W. Mathews, Beau P. Johnson, David J Braun

摘要

We present an adjustable-equilibrium parallel elastic actuator (AE-PEA) that combines a large direct-drive motor (DDM) and a 3D printed torsional spring with a small motor that can continuously adjust the equilibrium position of the actuator. We demonstrate that the AE-PEA achieves high torque control bandwidth like direct-drive actuators (DDAs), energy storage and shock mitigation like parallel elastic actuators (PEAs), and equilibrium position adjustment similar to series elastic actuators (SEAs). Owing to these features, we foresee the benefit of AE-PEAs in robot joints performing a variety of static load bearing and dynamic oscillatory motion.

RA-L 2026-03-31

Self-Organizing Edge Computing Distribution Framework for Visual SLAM

Jussi Kalliola, Lauri Suomela, Sergio Moreschini, David Hästbacka

导航 / SLAM / 自动驾驶
摘要

Localization in a known environment is an essential capability for mobile robots. Simultaneous Localization and Mapping (SLAM) addresses this by combining real-time tracking with computation-intensive map optimization, which can present a challenge for resource-limited robots. Edge-assisted SLAM approaches that offload heavy computation while maintaining real-time tracking onboard offer a potential solution. In this article, we propose a novel self-organizing VSLAM framework that provides a general structure for distributing existing keyframe-based VSLAM systems across a network of devices. The framework introduces a state management model for handling shared SLAM state among devices and a distribution policy for orchestrating the distribution in a self-organizing manner. To demonstrate the framework, we implemented it for monocular ORB SLAM3 using a three-layer architecture. The distributed SLAM was evaluated in both fully distributed and standalone configurations and compared against the original ORB SLAM3. The experimental results show that the proposed framework achieves comparable accuracy and resource utilization to ORB SLAM3. Moreover, the system can revert to standalone SLAM when network connectivity is lost, demonstrating effective self-organization. Our code is publicly available at: https://github.com/JussiKalliola/VSLAM-distribution-framework

RA-L 2026-03-25

Constraint Learning in Multi-Agent Dynamic Games From Demonstrations of Local Nash Interactions

Zhouyu Zhang, Chih-Yuan Chiu, Glen Chou

多机器人 / 集群控制与动力学
摘要

We present an inverse dynamic game-based algorithm to learn parametric constraints from a given dataset of local Nash equilibrium interactions between multiple agents. Specifically, we introduce mixed-integer linear programs (MILP) encoding the Karush-Kuhn-Tucker (KKT) conditions of the interacting agents, which recover constraints consistent with the local Nash stationarity of the interaction demonstrations. We establish theoretical guarantees that our method learns inner approximations of the true safe and unsafe sets. We also use the interaction constraints recovered by our method to design motion plans that robustly satisfy the underlying constraints. Across simulations and hardware experiments, our methods accurately inferred constraints and designed safe interactive motion plans for various classes of constraints, both convex and non-convex, from interaction demonstrations of agents with nonlinear dynamics.

RA-L 2026-03-25

Efficient Joint Estimation of Optical Flow and Stereo Disparity With Event Cameras

Muhammad Ahmed Humais, Sajid Javed, Yahya Zweiri

机器人学习感知与传感
摘要

Optical flow and stereo disparity, both are fundamental in the perception pipeline of robotic systems, enabling 3D understanding and motion estimation of the robot itself and the dynamic objects around. In this context, event cameras offer great potential to reduce latency and improve the efficiency of the perception pipeline. However, existing event-based approaches often deal with flow and disparity estimation tasks separately, potentially leading to redundant computations and reduced efficiency. In this work, we developed a joint flow and depth estimation network, featuring shared feature encoders for high computational efficiency. Moreover, we introduce a novel Bidirectional Mamba module to enhance feature expressiveness by increasing the spatial receptive field to capture global context with significantly lower computational overhead than vision transformers. We further improve efficiency by incorporating informed priors to reduce the number of refinement iterations in the sequential estimation task. Additionally, we extend our framework to combine the estimated flow and disparity to predict 3D scene flow, an important task in many robotics applications.

RA-L 2026-03-25

Data Scaling for Navigation in Unknown Environments

Lauri Suomela, Naoki Takahata, Sasanka Kuruppu Arachchige, Harry Edelman, Joni-Kristian Kämäräinen

导航 / SLAM / 自动驾驶机器人学习
摘要

Generalization of imitation-learned navigation policies to environments unseen in training remains a major challenge. We address this by conducting the first large-scale study of how data quantity and data diversity affect real-world generalization in end-to-end, map-free visual navigation. Using a curated 4,565-hour crowd-sourced dataset collected across 161 locations in 35 countries, we train policies for point goal navigation and evaluate their closed-loop control performance on sidewalk robots operating in four countries, covering 125 km of autonomous driving. Our results show that large-scale training data enables zero-shot navigation in unknown environments, approaching the performance of policies trained with environment-specific demonstrations. Critically, we find that data diversity is far more important than data quantity. Doubling the number of geographical locations in a training set decreases navigation errors by ∼15%, while performance benefit from adding data from existing locations saturates with very little data. We also observe that, with noisy crowd-sourced data, simple regression-based models outperform generative and sequence-based architectures. We release our policies, evaluation setup and example videos on the project page, lasuomela.github.io/navigation scaling.

RA-L 2026-03-25

A Simulation Platform for MARL Training and Evaluation in Swarm Confrontation

Qizhen Wu, Lei Chen, Kexin Liu, Jinhu Lü

机器人学习多机器人 / 集群
摘要

In swarm confrontation, robots must swiftly formulate strategies in transient environments, a challenge well-suited for multi-agent reinforcement learning (MARL). However, existing platforms suffer from the lack of comprehensive confrontation scenario modeling and scalable frameworks, hindering MARL's widespread applications. We introduce a novel platform for training, simulating, and evaluating MARL algorithms in swarm confrontation tasks. It constructs a holistic simulation framework by integrating robot, environment, and rule models for complex confrontation scenarios. Equipped with a decentralized task allocator and path planner for each robot, the platform enables scalable cooperation across dynamic environments. Extensive experiments demonstrate that our platform simulates confrontations involving up to twenty agents per side, providing empirical guidance for algorithm selection in various settings.

RA-L 2026-03-25

A Cross-Embodiment Gripper Benchmark for Rigid-Object Manipulation in Aerial and Industrial Robotics

Marek Vagas, Martin Varga, Jaroslav Romancik, Ondrej Majercak, Alejandro Suarez, Anibal Ollero, et al.

无人机 / 空中机器人操作与机械臂
摘要

Robotic grippers are increasingly deployed across industrial, collaborative, and aerial platforms, where each embodiment imposes distinct mechanical, energetic, and operational constraints. Established YCB and NIST benchmarks quantify grasp success, force, or timing on a single platform, but do not evaluate cross-embodiment transferability or energy-aware performance– essential for modern mobile and aerial manipulation. This letter introduces the Cross-Embodiment Gripper Benchmark (CEGB), a reproducible benchmarking suite ex tending YCB and selected NIST metrics with three additional components: a transfer-time benchmark measuring embodiment exchange effort, an energy-consumption benchmark evaluating grasping and holding efficiency, and an intent-specific ideal payload assessment. Together, these metrics characterize grasp performance and cross-platform suitability. CEGB is validated on two mechanically distinct grippers. The experimental evaluation quantifies embodiment-dependent differences in transfer time, energetic efficiency, and operational capability under unified statistical reporting. CEGB provides a reproducible foundation for cross-platform, energy-aware grip per evaluation.

RA-L 2026-03-25

Toward Simplicity and Practicality: A Novel Framework and Guidance for Robotic Table Tennis Applications

Qitong Guo, Xiaohang Shi, Ruoyu Jia, Chunxin Yang, Kenichi Murakami, Yuji Yamakawa

机器人学习人机交互 / 遥操作
摘要

Although many impressive advances have been reported in the table tennis robots field using reinforcement learning method, challenges related to policy complexity and adaptability continue to hinder large-scale deployment and practical applications. In this work, building upon extensive prior studies, we propose several techniques and integrate them into a unified framework that reduces the difficulty of training and deployment while enhancing the human player enjoyment. Specifically, a recursively nested design is introduced in the hierarchical decision-making system, which fully separates high-level decision-making from low-level execution, eliminating the need to train multiple agents at the execution layer. The multi-objective problem is also studied by introducing a tolerance-based weight modulation mechanism, which can balance the landing accuracy with other objectives and adjust the playing strategy to meet diverse goals. Comprehensive experiments demonstrate the effectiveness of the proposed framework and techniques. Notably, the robot achieved up to 18 consecutive rallies with human player, which, to the best of our knowledge, is the longest rally attained with a fixed-base collaborative robot arm. The proposed methods can be readily extended to other racket-sport robots or similar tasks. Our project website: https://guoqitong.github.io/ttrobot-hrl/ .

RA-L 2026-03-25

Distributionally Robust Acceleration Control Barrier Filter for Efficient UAV Obstacle Avoidance

Dnyandeep Mandaokar, Bernhard Rinner

无人机 / 空中机器人控制与动力学
摘要

Dynamic obstacle avoidance (DOA) for unmanned aerial vehicles (UAVs) requires fast reaction under limited onboard resources. We introduce the distributionally robust acceleration control barrier function (DR-ACBF) as an efficient collision avoidance method maintaining safety regions. The method constructs a second-order control barrier function as linear half-space constraints on commanded acceleration. Latency, actuator limits, and obstacle accelerations are handled through an effective clearance that considers dynamics and delay. Uncertainty is mitigated using Cantelli tightening with per-obstacle risk. A DR-conditional value at risk (DR-CVaR) early trigger expands margins near violations to improve DOA. To meet real-time avoidance-control at 100 Hz, we use fixed-time Gauss-Southwell projections instead of quadratic programs (QP). Simulation results show similar avoidance performance with 31% lower computational load than QP and outperform the state-of-the-art baseline approaches. Experiments with Crazyflie UAVs demonstrate the feasibility of our approach.

RA-L 2026-03-25

Decentralized Multi-Robot Herding via Partial Containment Strategy

Dac Dang Khoa Nguyen, Thiviyathinesvaran Palani, Hiroaki Fukushima, Alen Alempijevic, Gavin Paul

多机器人 / 集群控制与动力学
摘要

Effective herding of animals is essential for tasks such as livestock management and wildlife conservation, and multi-robot systems offer a promising solution to automate such processes. Existing approaches include reactive methods, which are efficient and straightforward but struggle with inconsistent animal behavior, and formation-based methods, which handle variability but face scalability issues as animals resist being fully encircled. We propose a novel, decentralized herding algorithm that combines the strengths of both approaches. The key contribution of this work is a novel partial containment formation enabled by control barrier functions (CBFs) that constrain animal movements and guide them toward a particular direction. Theoretical analyses highlight convergence and stability properties, and empirical results also reveal the role of animal aggregation preferences in herding performance, an aspect largely overlooked in prior work. Extensive simulations and real-robot experiments validate the theoretical analysis, demonstrating consistent performance and efficiency under various scenarios and disturbances. Finally, benchmarks against other state-of-the-art herding algorithms indicate that the proposed algorithm performs more consistently across diverse scenarios.

RA-L 2026-03-25

Sampling Augmented Bilevel Trajectory Optimization for Constrained Quadrotor Flights

Qiang Li, Lu Wang, Wenxing Fu

无人机 / 空中机器人导航 / SLAM / 自动驾驶
摘要

Planning smooth trajectories through a sequence of waypoints under nonconvex constraints is challenging due to the coupling between coefficient optimization and time allocation. Existing gradient-based spline trajectory optimization methods tend to be susceptible to local minima and poor initializations, or restrained by complicated gradient computations. We propose a sampling augmented bilevel optimization (SABO) approach that integrates gradient-based optimization with correlated spatio-temporal sampling for improved robustness and optimality. Through temporal normalization, the closed-form solution of coefficient optimization becomes an explicit function of segment durations, while the Hessian becomes linear in their powers, enabling analytic bilevel gradient computation without using finite differences or linearized constraints. Correlated mutations are subsequently performed around the gradient-induced solution to further explore the constrained spatio-temporal space, with sample projection and covariance matrix adaptation to guide sampling towards low-cost, feasible regions. Simulations show that SABO outperforms existing methods in terms of optimality and robustness. We validate SABO in flight experiments conducted on a quadrotor.

RA-L 2026-03-30

A Semantic-Aware Integrated A$^{*}$ and Artificial Potential Field Path Planning Framework

Wei Zhou, Zhouyingmiao Chen

导航 / SLAM / 自动驾驶
摘要

This letter addresses the challenge of reliable path planning for mobile robots navigating complex, semantically constrained environments. We propose a systematically integrated path planning framework that combines the A $^*$ algorithm with the artificial potential field (APF) method. Planning efficiency is improved through an adaptive cost function and an Environment-Aware Adaptive Robot Motion Block (EA-RMB) strategy, while navigation safety is reinforced by incorporating high-level semantic constraints, such as prohibited zones, via cost penalization. Furthermore, the repulsive component of the APF is integrated into the heuristic method of the proposed algorithm, enabling the planner to proactively avoid zones with dense obstacles during the global search process. Extensive simulations and experiments demonstrate that the proposed algorithm significantly improves planning efficiency while maintaining highly competitive path quality. Furthermore, it rigorously enforces semantic constraints, ensuring safe and reliable navigation in complex environments.

AuRo 2026-04-20

Diver interest via pointing in three dimensions: 3D pointing reconstruction for diver-AUV communication

Chelsey Edge, Demetrious Kutzke, Megdalia Bromhal, Junaed Sattar

感知与传感
摘要

This paper presents Diver Interest via Pointing in Three Dimensions (DIP-3D), a method to indicate an object of interest from a diver to an autonomous underwater vehicle (AUV) by pointing that includes three-dimensional distance information to discriminate between multiple objects in the AUV’s field of view. Traditional dense stereo vision for distance estimation underwater is challenging because of the relative lack of saliency of scene features and degraded lighting conditions. Yet in many applications, including distance information is necessary for robotic perception of diver pointing when multiple objects appear within the robot’s image view. We subvert the challenges of underwater distance estimation by using sparse reconstruction of specific keypoints in both the left and right images from the robot’s stereo camera to perform pose estimation. Triangulated pose keypoints, along with any object detection method, enable DIP-3D to infer the location of an object of interest when multiple objects are in the AUV’s field of view. By allowing the scuba diver to point at an arbitrary object of interest and enabling the AUV to autonomously decide which object the diver is pointing to, this method permits more natural interaction between AUVs and humans in underwater-human robot collaborative tasks.

JFR 2026-04-20

Intelligent Autonomy: A Novel Hybrid Navigation System for Autonomous Load‐Haul‐Dump Vehicles

Yuanjian Jiang, Pingan Peng, Xiaofeng Huo, Jiaheng Wang, Liguan Wang

导航 / SLAM / 自动驾驶
摘要

The automation and intelligence of underground mining vehicles are vital for ensuring safety and improving production efficiency, representing an essential trend in the evolution of the mining industry. However, achieving autonomous navigation for load‐haul‐dump (LHD) vehicles in GPS‐denied underground environments poses significant challenges. To address these challenges, we introduce a novel hybrid navigation (HN) strategy that combines the strengths of absolute navigation (AN), which relies on precise localization using pre‐mapped environments, with reactive navigation (RN), which utilizes real‐time sensor data for immediate navigation decisions. In this strategy, the AN facilitates map‐referenced positioning during turns, while the RN dynamically adjusts the trajectory on straight segments through real‐time sensor feedback, independent of absolute localization. This integration enhances the robustness of navigation. We conducted simulation experiments to compare RN, AN, and HN systems. The results demonstrate that the HN system effectively merges the adaptability of RN with the precision of AN, ensuring reliable navigation through narrow intersections and stable performance on straight paths. Field trials further validated the HN system's ability to operate an LHD vehicle at a linear speed of approximately 1.8 m/s and a turning speed of 0.6 m/s, underscoring its practical applications in real‐world scenarios. These findings highlight the HN system's potential for robust autonomous operation in complex underground environments.

JFR 2026-04-14

Six‐Dimensional Digital Twin System for Autonomous Underwater Vehicles: Conceptualization and Twin Experiments

Lin Yu, Lei Qiao

导航 / SLAM / 自动驾驶人机交互 / 遥操作
摘要

To promote the efficient, comprehensive, reliable, and low‐cost testing and application of intelligent algorithms for autonomous underwater vehicles (AUVs), this paper proposes an innovative six‐dimensional digital twin (6D DT) conceptual model and provides detailed engineering implementation strategies of this twin system. This model integrates six core dimensions, including physical entity, virtual entity, virtual native entity (VNE), twin data, services, and communication connection. The concept of VNE is introduced to significantly enhance the practicability, security, and reliability of AUV testing by constructing diversified test scenarios. To implement the proposed model, a high‐fidelity underwater Cyberspace visualization is developed using Unreal Engine 5, which improves the granularity of virtual–real mapping and enhances human–computer interaction. An efficient data bridge plugin is implemented to ensure real‐time, stable bidirectional communication. The DT system (DTS) supports both offline simulation and online DT modes, enabling flexible testing from pure software simulation to real‐time virtual–physical interaction, thereby enhancing the credibility of algorithm validation. Two experimental cases conducted on this DTS demonstrate the technical feasibility and reliability of the proposed conceptual model. The approach provides a valuable reference for applying digital twin technology in underwater unmanned systems and accelerates the development of autonomous intelligent AUVs.

JFR 2026-04-19

Actuation Strategies for Underwater Jet‐Propelled Soft Robots

Angel Kitone, Mohammed Anteet, Pawandeep S. Matharu, Yara Almubarak

医疗 / 软体 / 微纳
摘要

This review article examines jet‐propulsion mechanisms in underwater soft robotic systems, focusing exclusively on physically fabricated and experimentally validated robots. Covering research published from 2013 to 2025, this study classifies and evaluates jet‐propulsion robots based on their actuation mechanisms. This review outlines the fundamental working principles, discusses the key advantages and limitations, and assesses the practicality of these mechanisms for underwater applications. Additionally, a list of robots employing each actuation method is presented, illustrating the diversity of approaches within the field. By systematically comparing these studies, this review identifies critical performance trade‐offs, potential challenges, and opportunities for innovation in jet‐propulsion‐based underwater robotics. Ultimately, this work serves as a valuable resource for researchers and engineers, facilitating advancements in the design and application of bioinspired underwater jetting robots.

JFR 2026-04-19

Review of Essential Generic Technologies for Visual Perception in Underground Coal Mine Robots

Yuxin Du, Jianhua Zhang, Liang Liang, Bin Song

感知与传感
摘要

Visual perception technology serves as the core support for establishing the “perception‐decision‐control” closed‐loop system of underground coal mine robots (UCMRs). Research on its key generic technologies is crucial to promoting the large‐scale deployment of UCMRs in underground mines. This paper initially reviews the development status of UCMRs and elaborates on the functional roles and application scenarios of five core categories: tunneling, coal mining, auxiliary operations, inspection, and rescue robots, thereby clarifying the fundamental requirements of visual perception across diverse underground scenes. Subsequently, it concentrates on the generic technological system essential for UCMR visual perception, presenting an in‐depth analysis of the principles, research advancements, and application challenges related to four technical modules: environmental adaptability under extreme conditions, target detection and recognition in complex scenarios, 3D spatial perception and positioning, and lightweight algorithms suitable for edge computing. Finally, aligned with practical industry needs, the paper projects future development trends and practical challenges, proposing four strategic pathways for breakthroughs, namely human‐machine‐environment integrated active perception, automated iteration of datasets and models, cluster perception and interaction, and cross‐technology integration, and highlighting four core challenges, including environmental adaptability, hardware reliability, communication collaboration, and industry access compatibility. It aims to provide a valuable theoretical reference and technical support for the technological development, achievement transformation, and advancement of mine intellectualization.

JFR 2026-04-12

Optimization of Magnetic Adsorption Units for Wall‐Climbing Robots via Integrated Response Surface Methodology and Genetic Algorithm

Zan Wang, Qixiang Zhang, Jinghua Wu, Shuaikang Li

足式 / 四足机器人控制与动力学
摘要

To address the requirement for stable adsorption of climbing robots in confined spaces during ship hull cleaning operations, this study proposes a multi‐objective optimization design method for arc‐shaped magnetic adsorption units based on the response surface method‐genetic algorithm. By analyzing the influence relationships among magnetic circuit topology, hull plate thickness, and air gap dimensions, the study reveals the superior performance of a five‐loop Hallbarth array within specific air gap ranges and identifies the nonlinear impact of arc angle on magnetic adsorption units. A multi‐parameter coupling optimization framework linking arc angle to permanent magnet geometric parameters has been established to determine the optimal configuration of permanent magnet structural parameters and arc angle. Experimental validation demonstrates a 15.37% increase in magnetic adsorption force per unit mass after optimization, with robotic stability confirmed through motion testing. This research holds significant value for achieving stable adsorption and engineering applications of wall‐climbing robots on ship hull surfaces.

AuRo 2026-04-11

Multi-robot navigation in social mini-games: definitions, taxonomy, and algorithms

Rohan Chandra, Shubham Singh, Wenhao Luo, Katia Sycara

导航 / SLAM / 自动驾驶多机器人 / 集群
摘要

The “Last Mile Challenge” has long been considered an important, yet unsolved, challenge for autonomous vehicles, public service robots, and delivery robots. A central issue in this challenge is the ability of robots to navigate constrained and cluttered environments that have high agency (e.g., doorways, hallways, corridor intersections), often while competing for space with other robots and humans. We refer to these environments as “Social Mini-Games” (SMGs). Traditional navigation approaches designed for MRN do not perform well in SMGs, which has led to focused research on dedicated SMG solvers. However, publications on SMG navigation research make different assumptions (on centralized versus decentralized, observability, communication, cooperation, etc.), and have different objective functions (safety versus liveness). These assumptions and objectives are sometimes implicitly assumed or described informally. This makes it difficult to establish appropriate baselines for comparison in research papers, as well as making it difficult for practitioners to find the papers relevant to their concrete application. Such ad-hoc representation of the field also presents a barrier to new researchers wanting to start research in this area. SMG navigation research requires its own taxonomy, definitions, and evaluation protocols to guide effective research moving forward. This survey is the first to catalog SMG solvers using a well-defined and unified taxonomy and to classify existing methods accordingly. It also discusses the essential properties of SMG solvers, defines what SMGs are and how they appear in practice, outlines how to evaluate SMG solvers, and highlights the differences between SMG solvers and general navigation systems. The survey concludes with an overview of future directions and open challenges in the field. Our project is open-sourced at https://socialminigames.github.io/ .

RA-L 2026-03-26

Contact-Aware Morphology Optimization via Physically Consistent Differentiable Simulation

Filippo Luca Ferretti, Diego Ferigo, Alessandro Croci, Carlotta Sartore, Omar G. Younis, Silvio Traversaro, et al.

控制与动力学
摘要

Optimizing robot morphology and behavior remains an open challenge, primarily because hardware parameters affect system and contact dynamics, and naive parametrization can yield physically inconsistent designs and unreliable gradients. We propose a simulation-based co-design approach for gradient-based morphology optimization through intermittent contact by expressing inertial quantities, collision geometry, and contact forces as smooth functions of bounded hardware parameters. This parametrization preserves physical consistency across admissible designs and enables trajectory-level differentiation through contact transitions. The methodology is simulator-agnostic and is instantiated in a JAX-based differentiable simulator to leverage accelerator execution and automatic differentiation. We validate the approach by optimizing a single-leg morphology to reach a target jump height under torque limits. Controlled studies further assess correctness: synthetic system identification recovers ground-truth parameters to numerical precision, cross-engine calibration against MuJoCo reaches MSE below $10^{-5}$ , and second-order derivatives match finite differences. Compared to genetic algorithms, gradient-based optimization is substantially faster but more sensitive to initialization, while remaining computationally practical for co-design.

RA-L 2026-03-26

STaR: Scalable Task-Conditioned Retrieval for Long-Horizon Multi-Modal Robot Memory

Mingfeng Yuan, Hao Zhang, Mahan Mohammadi, Runhao Li, Jinjun Shan, Steven L. Waslander

导航 / SLAM / 自动驾驶
摘要

Mobile robots are often deployed over long durations in diverse open, dynamic scenes, including indoor settings such as warehouses and manufacturing facilities, and outdoor settings such as agricultural and roadway operations. A core challenge is to build a scalable long-horizon memory that supports an agentic workflow for planning, retrieval, and reasoning over open-ended instructions at variable granularity, while producing precise, actionable answers for navigation. We present STaR , an agentic reasoning framework that (i) constructs a task-agnostic, multi-modal long-term memory that generalizes to unseen queries while preserving fine-grained environmental semantics (object attributes, spatial relations, and dynamic events), and (ii) introduces a Scalable Task-Conditioned Retrieval algorithm based on the Information Bottleneck principle to extract from long-term memory a compact, non-redundant, information-rich set of candidate memories for contextual reasoning. We evaluate STaR on NaVQA (mixed indoor/outdoor campus scenes) and WH-VQA, a customized warehouse benchmark with many visually similar objects built with Isaac Sim , emphasizing contextual reasoning. Across the two datasets, STaR consistently outperforms strong baselines, achieving higher success rates and markedly lower spatial error. We further deploy STaR on a real Husky wheeled robot in both indoor and outdoor environments, demonstrating robust long-horizon reasoning, scalability, and practical utility.

RA-L 2026-03-26

Design and Implementation of a Dynamic Morphing Uncrewed Aerial–Aquatic Vehicle

Zhenjiang Wang, Yunhua Jiang, Zikun Zhen, Yifan Jiang, Jinxiu Zhang

无人机 / 空中机器人
摘要

Uncrewed Aerial–Aquatic Vehicles (UAAVs) have broad application potential in air–water missions, but efficient performance in two distinct media and stable transition remain challenging. This paper proposes a multirotor lifting-wing UAAV with dynamic morphing named Diving Hawk-4. The foldable rotor arms, designed with a parallelogram mechanism, can fold and unfold during motion. In the folded state, the vehicle reduces its envelope diameter by up to 63.8%, allowing it to traverse narrow spaces in aerial and underwater environments. The projected area decreases by 51.5%, which reduces the impact force during water entry. A two-stage morphing strategy enables the vehicle to exit the water within 0.7 s and then efficiently correct its attitude. The lifting wings reduce rotor thrust by 49.5% in simulation. Underwater, the folded state reduces drag by 38%, while the unfolded state increases turning rate by 91%. Aerodynamic and hydrodynamic evaluations, together with multi-domain tests, demonstrate the effectiveness of dynamic morphing.

RA-L 2026-03-25

Helical and Planar Continuum Robot Shapes by Structural Tube Modifications and Backbone Length Control

Camille Benoist, Cédric Girerd, Carlo Saija, Nabil Zemiti, Philippe Poignet, Pierre Berthet-Rayne

医疗 / 软体 / 微纳
摘要

Navigating tortuous anatomical pathways, such as the aortic arch, requires continuum robots capable of complex three-dimensional deformations. Existing designs are often limited to specific curve families, and extending their shape capabilities usually increases mechanical complexity and system size. In this paper, we present a novel hybrid continuum robot that combines a multi-backbone structure with a flexible notched tube. Control rods enable bending, compression, and extension, while the tube's structural modifications induce torsion along the backbone. This allows the robot to achieve both planar and helical shapes in a compact form factor. Simulations and experiments were performed on a prototype with an outer diameter of 7 mm and length of 107.4 mm, achieving a mean backbone shape and tip position errors of 1.25 mm and 1.63 mm, respectively, in EM-tracking experiments. A smaller prototype was used to demonstrate the feasibility of navigating through an aortic arch phantom, highlighting the potential for miniaturized implementations in highly constrained environments.

RA-L 2026-03-25

Attitude Estimated Constraint Circular Motion Control for Underactuated Unicycle Robots

Pushkal Purohit, Anoop Jain

控制与动力学
摘要

This paper investigates the problem of motion confinement for an underactuated unicycle robot while stabilizing it along a desired circular trajectory. It is considered that only the robot's position is available for feedback, and its heading angle measurements are not available. To address this, we design an observer for the robot's position and heading angle estimates using only available position measurements. Leveraging these state estimates and the barrier Lyapunov function (BLF), we propose a variable gain-based control design relying on a composite Lyapunov function that simultaneously captures the desired control objectives and convergence of observer errors. It was shown that the proposed controller not only asymptotically stabilizes the motion of the unicycle robot on the desired circular path but also restricts its trajectories within the predefined circular boundary almost everywhere. Furthermore, we elaborate on how the state constraints imposed in the original problem formulation are effectively enforced through the estimated variables. Both numerical simulations and real-time experiments on the Khepera IV robot are provided.

RA-L 2026-03-25

A Unified Framework for Transient Analysis in Queueing Networks

Taojun Wang, Qing Wang, Jingshan Li

控制与动力学
摘要

This paper presents a unified framework for transient analysis of cascaded M/M/c/K networks with split and merge under multiple service policies, including first-come-first-served (FCFS), priority, percentage, and circulate. A history-buffer mechanism is introduced to preserve temporal arrival composition, enabling propagation of nonstationary departures across interconnected nodes without enlarging the state space. Coupled with time-dependent Kolmogorov forward equations, a stable explicit iteration is developed to compute transient queue-length distributions. Numerical experiments verify that the proposed method can accurately reproduce full transient dynamics across multi-stage networks.

RA-L 2026-03-25

Geometrically-Constrained Radar-Inertial Odometry via Continuous Point-Pose Uncertainty Modeling

Wooseong Yang, Dongjae Lee, Minwoo Jung, Ayoung Kim

导航 / SLAM / 自动驾驶
摘要

Radar odometry is crucial for robust localization in challenging environments; however, the sparsity of reliable returns and distinctive noise characteristics impede its performance. This paper introduces geometrically-constrained radar-inertial odometry and mapping that jointly consolidates point and pose uncertainty. We employ the continuous trajectory model to estimate the pose uncertainty at any arbitrary timestamp by propagating uncertainties of the control points. These pose uncertainties are continuously integrated with heteroscedastic measurement uncertainty during point projection, thereby enabling dynamic evaluation of observation confidence and adaptive down-weighting of uninformative radar points. By leveraging quantified uncertainties in radar mapping, we construct a high-fidelity map that improves odometry accuracy under imprecise radar measurements. Moreover, we reveal the effectiveness of explicit geometrical constraints in radar-inertial odometry when incorporated with the proposed uncertainty-aware mapping framework. Extensive experiments on diverse real-world datasets demonstrate the superiority of our method, yielding substantial performance improvements in both accuracy and efficiency compared to existing baselines.

RA-L 2026-03-25

LED Pouch Motor: Wavelength Selective Wireless Actuation of Dyed Liquid-to-Gas Phase Change Actuators Using LEDs

Kazuki Yamaura, Hiroki Ishizuka, Takefumi Hiraki

人机交互 / 遥操作
摘要

Untethered soft actuators that operate without physical connections to external power sources or control system cables are an active area of research for their potential applications in human-robot interaction, haptic devices, and 4D printing due to their wide range of motion and high degrees of freedom. Among these actuators, liquid-to-gas phase change actuators have a simple structure and can generate higher output force using photothermal conversion. In this study, we propose a novel control method for untethered soft actuators by dyeing the low-boiling-point liquids used in liquid-to-gas phase change actuators. The dyed low-boiling-point liquids exhibit different light absorption spectra matching their color, enabling wavelength-selective actuation using light sources with corresponding wavelengths. We evaluated and discussed this control method in terms of the relationship between irradiance and output, actuation speed, and continuous operation. Additionally, we prototyped a robotic arm to demonstrate selective motion according to the light color.

JFR 2026-04-15

Development of Dual‐Functional Flexible Ultrasonic Sensor Array of Proximity Sensing and Material Recognition for Safety Control of Robot

Lijie Zhou, Yuan Li, Zhan Duan, Zhentao Zhou, Tiezhu Liu, Jia Zhang

感知与传感
摘要

The operational safety of robots in complex environments is a critical consideration in contemporary robotics applications, since it directly impacts both task completion and the robot's own integrity. Currently, safety control in robots is predominantly achieved through multimodal sensor fusion technologies involving vision, LiDAR, and radar systems. However, these approaches still present limitations such as high hardware costs, large data volumes, complex data processing requirements, and slow response times. Herein, we present a dual‐modal ultrasonic sensing system that simultaneously measures object proximity and identifies material properties, thereby enhancing robotic perception. The system addresses key challenges mentioned above and achieves a 98\% classification accuracy for 13 common industrial materials and maintains a proximity sensing error within 3 mm. Meanwhile, the system also maintains a good real‐time performance with milliseconds of response. These outcomes contribute to safer robot control and improve environmental suitability.

JFR 2026-04-07

A Feature‐Decoupled and Gated‐Interaction‐Enhanced Deep Reinforcement Learning for Path‐Following of Large‐Inertia Vessels

Gang Chen, Zihao Wang, Xinhao Zhao, Jianbo Zheng, Baoan Li, Chenguang Yang, et al.

机器人学习控制与动力学
摘要

Path‐following control for large‐inertia surface vessels remains challenging due to slow yaw dynamics and environmental uncertainties. This paper proposes a hybrid framework integrating Line‐of‐Sight (LOS) guidance, drift‐angle compensation, and an Adaptive Gated Interaction Twin Delayed Deep Deterministic Policy Gradient (AGI‐TD3) controller. The primary innovation lies in the hierarchical feature‐processing architecture of the AGI‐TD3 agent. By implementing decoupled encoding for heterogeneous state‐action subspaces, the network isolates interference from disparate physical semantics. Furthermore, an adaptive gated interaction mechanism is introduced to selectively modulate information flows, reinforcing the causal relationship between heading errors and control actions. Simulation results and full‐scale experiments on a 40 m‐class vessel demonstrate that the proposed method significantly improves path following performance and robustness under realistic operating conditions.

RA-L 2026-03-26

Origami-Based Inner Channel Securing Mechanism for Soft Growing Robots

Sanghun Lee, Nam Gyun Kim, Shinwoo Park, Dae-Young Lee, Jee-Hwan Ryu

摘要

Soft growing robots possess unique advantages, such as the ability to navigate confined and complex environments. Although securing a stable inner channel is essential for enabling a wide range of applications, such as sensor integration, tool passage, and material transfer, the pressure required for eversion based growth imposes compressive loading on the channel, causing it to collapse by constriction. Existing approaches for channel stabilization rely on auxiliary actuation or complex control, which introduce leakage, deformation, or limited scalability. To over come these limitations, this paper presents an origami-inspired mechanism embedded into the robot membrane that inherently forms a structurally rigid inner channel without additional actuation, while preserving the intrinsic benefits of soft growing robots. The design further enables user-defined customization of channel geometry to meet application-specific requirements. A comprehensive modeling framework is developed to characterize the geometric and mechanical behavior of the mechanism, and its validity is confirmed through simulations and comparative experiments against conventional soft growing robots. Demonstrations, including inner channel visualization, steering, growth through confined paths, and parameter-dependent scalability, validate the practicality and versatility of the proposed approach.

RA-L 2026-03-26

Development of a Capacitance-Based Sensor for Flexible Rolling Contact Joint Systems

Hao Liu, Jia Yu, Ting Wang, Hongqiang Wang

摘要

This paper presents a rolling-contact joint embedded with a capacitive angle sensing method fabricated using a flexible printed circuit board process. By leveraging the inherent mechanical constraints of the joint's crossed-band structure, enabling high-precision angular measurement without requiring additional rigid supports. The method is applicable to both rigid and soft joint configurations, offering a compact, lightweight, and compliant sensing solution with an extended measurement range compared to conventional flexible sensors. A 2-DOF cable-driven joint with decoupled motion was realized through optimized cable routing, and a 3-DOF robotic arm integrating the proposed sensor achieved real-time position control. The sensing approach was further validated in inflatable soft joints, demonstrating adaptability across diverse robotic platforms.

JFR 2026-04-10

Fusion‐Guided and Distillation‐Optimized Framework for Freespace Detection in Off‐Road Environments

Jing Lian, Duo Sui, Linhui Li, Xiaofang Yuan, Yaonan Wang, Haoyuan Kang, et al.

感知与传感
摘要

In response to the challenges posed by the dynamic and diverse terrain of off‐road environments, the limited scale of data samples, and the insufficient generalization capability of models, this paper proposes an off‐road freespace detection framework called OFF‐LIP SAM. This framework employs a cascaded fusion approach, using dense point clouds as prompts to interactively guide the inference of the Segment Anything Model (SAM) vision large model. The framework first constructs a point cloud densification algorithm, calculating multi‐frame point cloud pose transformation matrices through a lidar inertial odometer and building a dense local point cloud map using a dynamic adaptive keyframe strategy. A dynamic keypoint sampling algorithm is then designed to construct feature point clouds, which are used as prior information to interactively generate regions of interest, enabling knowledge and data to jointly drive SAM model segmentation inference. Finally, this study proposes and implements a two‐stage knowledge distillation framework based on dynamic sampling prompts. This significantly improves the SAM inference speed while maintaining accuracy and generalization, making it applicable directly in resource‐constrained real‐vehicle environments. The experimental results demonstrate that the proposed OFF‐LIP SAM detection framework achieves competitive performance on the ORFD, RELLIS‐3D, and WildScenes data sets, outperforming current state‐of‐the‐art algorithms and strong baseline models. Real‐vehicle experiments conducted on the ORIN vehicle‐mounted computing platform validate the feasibility and effectiveness of the OFF‐LIP SAM detection framework.

RA-L 2026-03-25

Reconfigurable Straight-Spoke Tri-Wheel Mechanism With Four Bar Linkage for Optimal Stair Climbing

Liran Zhou

摘要

In this paper, we propose a new four-bar linkage reconfigurable wheel for stair climbing. Previous works on reconfigurable wheel-based stairclimbers focus on curved-spoke geometries, but we propose a Y-shaped straight-spoke geometry and show why it is superior via a static analysis. Our mechanism adds reconfigurability to the straight-spoke structure, allowing it to both travel smoothly on flat surfaces and climb steep stairs. A four-bar linkage is used for the expansion mechanism, allowing the wheel to mimic a straight-spoke design when fully expanded. We conduct a kinematic study of our mechanism to find the theoretical path during ascent. Then, we simulated our mechanism with PyBullet, finding that the new mechanism achieves stronger results in both ascending and descending stairs when compared to standard curved-spoke designs and similar results when compared to the straight-spoke design. Following this, we analyze and compare the simulated paths of each design. Finally, we investigated boundary failure cases, focusing on the average angular velocity to further compare the ascent capabilities of our design against the curved-spoke design.

JFR 2026-03-27

A Wheeled Robot Inspection System for Long‐Term Operation in Large‐Scale Industrial Environments

Chenpeng Yao, Chengju Liu, Hong Chen, Qijun Chen

导航 / SLAM / 自动驾驶感知与传感控制与动力学
摘要

Robotic navigation and object detection technologies have advanced significantly. However, deploying inspection systems in large‐scale industrial environments, particularly for long‐term operations, remains challenging due to the lack of a comprehensive software and hardware platform. To address these challenges, this paper presents a wheeled robotic inspection system designed for sustained operation in large‐scale industrial settings. A novel roadmap construction method is introduced to optimize spatial structures for real‐time processing. Additionally, a feedback mechanism is proposed to ensure stable and high‐performance operation over extended periods. The system is further supported by a hardware platform that seamlessly integrates with the software framework, enhancing overall operational performance and reliability. Experimental results validate the effectiveness of the proposed method, while real‐world testing demonstrates the system's feasibility and stability for long‐term deployment. This work provides a comprehensive solution for robotic inspection in large‐scale environments, offering a practical and scalable reference for researchers and practitioners.

JFR 2026-04-01

Development of Spatial Path Tracking Algorithm and Controller for a 6‐SPS Stewart Parallel Manipulator: A Simulation and Experimental Study

Dev Kunwar Singh Chauhan, Pandu R. Vundavilli

操作与机械臂导航 / SLAM / 自动驾驶
摘要

The Stewart parallel robot is popular for its high payload capacity due to its six prismatic links. Researchers worldwide are exploring it for various applications. In this work, the authors have developed an inverse kinematics‐based spatial path tracking algorithm for the Stewart platform that allows it to track circular paths in multiple planes. Authors also conducted experiments to test the algorithm. Initially, they established the inverse kinematics, Jacobian, and singularity of the robot. Next, they established motion planning for the robot using a third‐order polynomial in task space. Subsequently, they developed a motion controller for an individual joint actuator, employing a PID control strategy to precisely control its motion. After that, they controlled the overall motion of the Stewart manipulator using inverse kinematics by utilizing the actuator's PID‐based motion controller. The authors accomplished a novel path tracking method after breaking the whole path into multiple small trajectories and matching the endpoint velocities. Later, they used the developed path‐tracking algorithm to generate a circular shape on the aluminum disc. The developed algorithm successfully created a circular form on the aluminum disc for the incremental form application.

JFR 2026-04-01

Redefining Optimal Coverage Path Planning for FLS‐Equipped AUVs With Deep Reinforcement Learning

Lorenzo Cecchi, Alberto Topini, Alessandro Bucci, Alessandro Ridolfi

导航 / SLAM / 自动驾驶机器人学习
摘要

Autonomous Underwater Vehicles (AUVs) have emerged as indispensable tools for a variety of subsea tasks, from habitat monitoring and seabed mapping to infrastructure inspection and mine countermeasures. A fundamental challenge in this field is Coverage Path Planning (CPP), the problem of ensuring complete and efficient area coverage. Within this research activity, we propose a Deep Reinforcement Learning (DRL)‐based framework for CPP in underwater environments using a Forward‐Looking Sonar (FLS). We validate the proposed methodology through simulation experiments comparing it with the classical lawnmower path and a state‐of‐the‐art sampling‐based algorithm. Results demonstrate that our DRL‐based solution outperforms these baseline approaches in terms of coverage time per unit area and path length. Additionally, we present on‐field deployment outcomes on FeelHippo AUV, showcasing the feasibility and practicality of our framework in real‐world underwater missions.

JFR 2026-04-08

Cover Image, Volume 43, Number 3, May 2026

Zhenliang Zheng, Yongyuan Xu, Xuchun He, Tin Lun Lam, Ning Ding

摘要

The cover image is based on the article A Cascaded Strategy With Embodied Artificial Intelligence: Forward Kinematics Solutions for CCRobot-S by Zhenliang Zheng et al., https://doi.org/10.1002/rob.70140.

JFR 2026-03-23

DURAL: Degradation‐Resistant Robust Adaptive Localization by LiDAR‐Inertial‐UWB‐Wheel Fusion for Coal Mine Robots

Kun Hu, Menggang Li, Zhiwen Jin, Chaoquan Tang, Eryi Hu, Gongbo Zhou

导航 / SLAM / 自动驾驶感知与传感
摘要

Simultaneous Localization and Mapping (SLAM) in large‐scale, complex, global positioning system (GPS)‐denied underground coal mines poses significant challenges. In these environments, abnormal conditions hinder sensor performance: GPS unavailability impedes scene reconstruction and geographic referencing; uneven or slippery terrain degrades wheel odometer accuracy; and long, feature‐poor tunnels reduce light detection and ranging (LiDAR) effectiveness. To address these challenges, we propose DURAL, a multimodal SLAM framework based on the Iterated Error‐State Kalman Filter that fuses multiple sensors from coal mine robots to overcome individual sensor limitations. First, LiDAR‐inertial odometry is tightly coupled with Ultra‐Wideband (UWB) absolute positioning constraints to establish an absolute coordinate system. Next, the wheel odometer is integrated through tight coupling, enhanced by nonholonomic constraints and vehicle lever arm compensation, to mitigate performance degradation beyond the UWB measurement range. Finally, an adaptive fusion mode switching mechanism dynamically adjusts sensor constraints based on UWB coverage and environmental conditions. Experimental results indicate that our method achieves state‐of‐the‐art accuracy and robustness in both simulated tunnel environments and real‐world underground coal mines. In real‐world experiments, the system attains an absolute pose error of 0.167 m within the UWB range, maintains a relative pose error of 6.53% outside this range, and improves mapping accuracy to 6.456 cm, significantly outperforming existing approaches in challenging mining scenarios.