Tactile-grounded reasoning for multimodal LLMs

Touch-R1: Reinforcing Touch Reasoning in MLLMs

Touch-R1 trains multimodal models to reason from physical contact, using tactile-grounded GRPO over optical tactile streams.

TouchReason-1M · TouchReason-Bench · Qwen2.5-VL-7B

When vision is misleading, touch becomes the evidence.

Touch-R1 reads deformation, shear, depth, and force-conditioned contact cues, then produces structured <perceive> · <compare> · <conclude> reasoning traces.

2.82Maligned tactile frames
12.2Kaudited tactile sequences
4,800held-out benchmark QA pairs
60.1Touch-R1-7B average score
Abstract

Physical contact as a reasoning signal.

Touch-R1 introduces R1-style rule-based reinforcement learning to tactile reasoning. We build TouchReason-1M, a multi-sensor dataset with synchronized tactile-force records and verified reasoning QA, and TouchReason-Bench, a benchmark for tactile perception, ordinal comparison, cross-sensor consistency, and visual-tactile conflict. Touch-R1 combines ordinal-aware accuracy, cross-sensor physical consistency, structured-format control, and input-side tactile grounding, enabling MLLMs to revise visual priors using contact evidence.

Video Examples

Paired RGB and tactile streams.

Short synchronized clips from TouchReason-style acquisition: a soft plastic cup-lid protrusion under press and shear.

Normal Press

Continuous contact from non-contact toward stronger normal force.

RGB side viewOptical tactile view

Upward Exploratory Motion

Shear-inducing movement reveals motion-dependent tactile response.

RGB side viewOptical tactile view
TouchReason-1M

Multi-sensor data for tactile reasoning.

TouchReason-1M couples tactile frames, deformation fields, shear, depth, force, metadata, ordinal labels, and reasoning QA.

1,748 groupsHeld-out trial groups are separated to avoid object and contact-region leakage.
12,183 sequencesH5-readable tactile interactions with force/IMU-aligned contact traces.
230K + 100KCold-start SFT examples plus GRPO prompts with eight rollouts per prompt.
Reasoning QAVerified rationales grounded in tactile evidence.
Overview of TouchReason-1M
Main-paper overview of TouchReason-1M.
Representative tactile contact features
Representative tactile signals: raw deformation, displacement, shear, and depth.
TouchReason-1M reasoning corpus composition
Composition of the generated TouchReason-1M reasoning corpus.
Method

Three-stage tactile-grounded training.

Touch-R1 keeps the project-page story simple: learn touch dynamics, align structured QA, then reinforce tactile-grounded reasoning.

Tactile Dynamics

Pretrain a touch encoder by predicting future tactile tokens from contact histories.

QA SFT

Align touch, vision, and language to a structured reasoning interface.

Grounded GRPO

Reward answers that are correct, consistent, parsable, and tactile-dependent.

Ordinal-aware accuracy
Cross-sensor consistency
Structured format
Tactile grounding
J_Touch-R1(theta) = J_GRPO(theta; R) + gamma G_tg(theta),    R = 0.6 R_acc + 0.3 R_cs + 0.1 R_fmt
Touch-R1 framework
Main-paper framework of Touch-R1.
Results

Stronger tactile reasoning across benchmark families.

Touch-R1-7B outperforms frontier MLLMs, general VLMs, and tactile-specialist baselines on TouchReason-Bench.

Qualitative Touch-R1 reasoning examples
Main-paper qualitative reasoning examples.

TouchReason-Bench Avg

Qwen2.5-VL-7B
28.0
GPT-4o
35.4
Octopi-13B
41.7
SToLa
47.5
Touch-R1-7B
60.1
ModelOMAE ↓L2-EMCSCAvg
Gemini-2.5-Pro0.6713.841.837.6
SToLa0.4522.151.947.5
Touch-R1-7B0.2439.571.360.1
Analysis

Compact diagnostics.

Main-paper scale behavior and reward/data ablations.

Citation

BibTeX

@article{touchr1,
  title   = {Touch-R1: Reinforcing Touch Reasoning in MLLMs},
  author  = {Touch-R1 Authors},
  journal = {arXiv preprint},
  year    = {2026}
}