Tactile-grounded reasoning for multimodal LLMs

Touch-R1: Reinforcing Touch Reasoning in MLLMs

Touch-R1 trains multimodal models to reason from physical contact, using tactile-grounded GRPO over optical tactile streams.

TouchReason-1M · TouchReason-Bench · Qwen2.5-VL-7B

Live Page GitHub Video Method Results

When vision is misleading, touch becomes the evidence.

Touch-R1 reads deformation, shear, depth, and force-conditioned contact cues, then produces structured <perceive> · <compare> · <conclude> reasoning traces.

2.82Maligned tactile frames

12.2Kaudited tactile sequences

4,800held-out benchmark QA pairs

60.1Touch-R1-7B average score

Abstract

Physical contact as a reasoning signal.

Touch-R1 introduces R1-style rule-based reinforcement learning to tactile reasoning. We build TouchReason-1M, a multi-sensor dataset with synchronized tactile-force records and verified reasoning QA, and TouchReason-Bench, a benchmark for tactile perception, ordinal comparison, cross-sensor consistency, and visual-tactile conflict. Touch-R1 combines ordinal-aware accuracy, cross-sensor physical consistency, structured-format control, and input-side tactile grounding, enabling MLLMs to revise visual priors using contact evidence.

Video Examples

Paired RGB and tactile streams.

Short synchronized clips from TouchReason-style acquisition: a soft plastic cup-lid protrusion under press and shear.

Normal Press

Continuous contact from non-contact toward stronger normal force.

RGB side viewOptical tactile view

Upward Exploratory Motion

Shear-inducing movement reveals motion-dependent tactile response.

RGB side viewOptical tactile view

TouchReason-1M

Multi-sensor data for tactile reasoning.

TouchReason-1M couples tactile frames, deformation fields, shear, depth, force, metadata, ordinal labels, and reasoning QA.

1,748 groupsHeld-out trial groups are separated to avoid object and contact-region leakage.

12,183 sequencesH5-readable tactile interactions with force/IMU-aligned contact traces.

230K + 100KCold-start SFT examples plus GRPO prompts with eight rollouts per prompt.

Reasoning QAVerified rationales grounded in tactile evidence.

Overview of TouchReason-1M — Main-paper overview of TouchReason-1M.

Representative tactile contact features — Representative tactile signals: raw deformation, displacement, shear, and depth.

TouchReason-1M reasoning corpus composition — Composition of the generated TouchReason-1M reasoning corpus.

Method

Three-stage tactile-grounded training.

Touch-R1 keeps the project-page story simple: learn touch dynamics, align structured QA, then reinforce tactile-grounded reasoning.

Tactile Dynamics

Pretrain a touch encoder by predicting future tactile tokens from contact histories.

QA SFT

Align touch, vision, and language to a structured reasoning interface.

Grounded GRPO

Reward answers that are correct, consistent, parsable, and tactile-dependent.

Ordinal-aware accuracy

Cross-sensor consistency

Structured format

Tactile grounding

J_Touch-R1(theta) = J_GRPO(theta; R) + gamma G_tg(theta), R = 0.6 R_acc + 0.3 R_cs + 0.1 R_fmt

Touch-R1 framework — Main-paper framework of Touch-R1.

Results

Stronger tactile reasoning across benchmark families.

Touch-R1-7B outperforms frontier MLLMs, general VLMs, and tactile-specialist baselines on TouchReason-Bench.

Qualitative Touch-R1 reasoning examples — Main-paper qualitative reasoning examples.

TouchReason-Bench Avg

Qwen2.5-VL-7B

28.0

GPT-4o

35.4

Octopi-13B

41.7

SToLa

47.5

Touch-R1-7B

60.1

Model	OMAE ↓	L2-EM	CSC	Avg
Gemini-2.5-Pro	0.67	13.8	41.8	37.6
SToLa	0.45	22.1	51.9	47.5
Touch-R1-7B	0.24	39.5	71.3	60.1

Analysis

Compact diagnostics.

Main-paper scale behavior and reward/data ablations.

Touch-R1 scaling behavior — Scaling across 3B, 7B, and 14B backbones.

Touch-R1 ablation analysis — Ablations over data, generation length, and reward weights.

Citation

BibTeX

@article{touchr1,
  title   = {Touch-R1: Reinforcing Touch Reasoning in MLLMs},
  author  = {Touch-R1 Authors},
  journal = {arXiv preprint},
  year    = {2026}
}