RATS: Reward-Aware Trajectory Shaping for Few-step Visual Generation

Preference-aligned few-step image and video generation without test-time overhead.

Rui Li*, Bingyu Li*, Yuanzhi Liang, Haibin Huang, Chi Zhang, XueLong Li

University of Science and Technology of China   |   TeleAI

* Equal contribution

Comparison of RATS with teacher imitation and terminal reward optimization
RATS introduces reward-aware trajectory shaping for few-step visual generation, transferring intermediate teacher knowledge only when it remains beneficial under the reward objective.

Abstract

Achieving high-fidelity generation in extremely few sampling steps has long been a central goal of generative modeling. Existing approaches largely rely on distillation-based frameworks to compress the original multi-step denoising process into a few-step generator. However, such methods constrain the student to imitate a stronger multi-step teacher, imposing the teacher as an upper bound on student performance.

We propose Reward-Aware Trajectory Shaping (RATS), a lightweight framework for preference-aligned few-step generation. Teacher and student latent trajectories are aligned at key denoising stages through horizon matching, while a reward-aware gate adaptively regulates teacher guidance based on relative reward performance. RATS improves the efficiency-quality trade-off in few-step visual generation while preserving the deployment efficiency of the student model.

Method

Horizon Matching

Aligns teacher and student latent trajectories across key denoising stages under compressed step budgets.

Reward-Aware Gate

Strengthens teacher guidance when the teacher is more reward-preferred and relaxes it when the student catches up.

Efficient Inference

Uses the multi-step EMA teacher only during training, adding no computational overhead at test time.

Reward-aware teacher-student trajectory shaping framework
Overview of reward-aware teacher-student trajectory shaping in RATS.

Quantitative Results

Image Generation: Baseline vs. RATS

RATS improves FLUX1.0-dev across 3, 5, 8, and 50 NFEs.

Method NFEs HPS ↑ PickScore ↑ ImageReward ↑
Baseline318.4319.99-0.3551
Ours332.15 +13.7222.46 +2.471.0956 +1.4506
Baseline526.1221.830.7443
Ours532.16 +6.0422.68 +0.851.1337 +0.3894
Baseline828.4322.420.9140
Ours833.81 +5.3823.14 +0.721.3240 +0.4100
Baseline5029.7622.591.0037
Ours5032.95 +3.1922.79 +0.201.1544 +0.1507

Image Generation: Comparison with Few-step Baselines

RATS achieves the best HPS and PickScore across all evaluated NFE settings.

Method NFEs HPS ↑ PickScore ↑ ImageReward ↑
Flux318.4319.99-0.3551
Hyper-SD328.8022.140.9882
SenseFlow330.6322.331.2030
Ours332.1522.461.0956
Flux526.1221.830.7443
Hyper-SD527.8322.081.0710
SenseFlow530.9922.531.2110
Ours532.1622.681.3337
Flux828.4322.420.9140
Hyper-SD830.5022.761.0410
SenseFlow830.9922.591.1720
Ours833.8123.141.3240
Flux5029.7622.591.0037
Hyper-SD5030.0122.510.9461
SenseFlow5030.6922.311.0810
Ours5032.9522.791.1544

Video Generation: VBench Overall Scores

The largest improvements appear in the low-step video generation regime.

Method NFEs Quality Score ↑ Semantic Score ↑ Total Score ↑
Wan5083.0862.9379.05
Ours5083.99 +0.9165.99 +3.0680.40 +1.35
Wan877.8248.7472.01
Ours882.66 +4.8470.35 +21.6180.20 +8.19
Wan573.6434.1065.73
Ours581.23 +7.5967.78 +33.6878.53 +12.80

Efficiency Comparison

RATS combines few-step generation and preference alignment with substantially lower total training time.

Method Step Time (s) Peak Memory (GB) Per-Step Compute (TFLOPs) Total Steps (K) Total Time (h) Extra Data Few-Step Preference Align
SenseFlow7.3178.87801.2812.024.35YesYesNo
DanceGRPO212.7134.051605.000.211.78NoNoYes
Ours7.5767.671229.600.40.83NoYesYes

Qualitative Results

Experiments show that RATS consistently improves the efficiency-quality frontier, narrowing the gap between few-step students and stronger multi-step generators for both image and video generation.

Main paper FLUX image generation case study
FLUX case study from the main paper.
Main paper WAN video generation case study
WAN case study from the main paper.

Reward Dynamics

Reward dynamics and teacher-student gap under few-step budgets
Reward dynamics and teacher-student gap analysis under different few-step sampling budgets.

Citation

@article{li2026rats,
  title   = {Reward-Aware Trajectory Shaping for Few-step Visual Generation},
  author  = {Li, Rui and Li, Bingyu and Liang, Yuanzhi and Huang, Haibin and Zhang, Chi and Li, XueLong},
  journal = {arXiv preprint arXiv:2604.14910},
  year    = {2026}
}