RATS: Reward-Aware Trajectory Shaping for Few-step Visual Generation

Preference-aligned few-step image and video generation without test-time overhead.

Rui Li*, Bingyu Li*, Yuanzhi Liang, Haibin Huang, Chi Zhang, XueLong Li

University of Science and Technology of China | TeleAI

* Equal contribution

Comparison of RATS with teacher imitation and terminal reward optimization — RATS introduces reward-aware trajectory shaping for few-step visual generation, transferring intermediate teacher knowledge only when it remains beneficial under the reward objective.

Abstract

Achieving high-fidelity generation in extremely few sampling steps has long been a central goal of generative modeling. Existing approaches largely rely on distillation-based frameworks to compress the original multi-step denoising process into a few-step generator. However, such methods constrain the student to imitate a stronger multi-step teacher, imposing the teacher as an upper bound on student performance.

We propose Reward-Aware Trajectory Shaping (RATS), a lightweight framework for preference-aligned few-step generation. Teacher and student latent trajectories are aligned at key denoising stages through horizon matching, while a reward-aware gate adaptively regulates teacher guidance based on relative reward performance. RATS improves the efficiency-quality trade-off in few-step visual generation while preserving the deployment efficiency of the student model.

Method

Horizon Matching

Aligns teacher and student latent trajectories across key denoising stages under compressed step budgets.

Reward-Aware Gate

Strengthens teacher guidance when the teacher is more reward-preferred and relaxes it when the student catches up.

Efficient Inference

Uses the multi-step EMA teacher only during training, adding no computational overhead at test time.

Reward-aware teacher-student trajectory shaping framework — Overview of reward-aware teacher-student trajectory shaping in RATS.

Quantitative Results

Image Generation: Baseline vs. RATS

RATS improves FLUX1.0-dev across 3, 5, 8, and 50 NFEs.

Method	NFEs	HPS ↑	PickScore ↑	ImageReward ↑
Baseline	3	18.43	19.99	-0.3551
Ours	3	32.15 +13.72	22.46 +2.47	1.0956 +1.4506
Baseline	5	26.12	21.83	0.7443
Ours	5	32.16 +6.04	22.68 +0.85	1.1337 +0.3894
Baseline	8	28.43	22.42	0.9140
Ours	8	33.81 +5.38	23.14 +0.72	1.3240 +0.4100
Baseline	50	29.76	22.59	1.0037
Ours	50	32.95 +3.19	22.79 +0.20	1.1544 +0.1507

Image Generation: Comparison with Few-step Baselines

RATS achieves the best HPS and PickScore across all evaluated NFE settings.

Method	NFEs	HPS ↑	PickScore ↑	ImageReward ↑
Flux	3	18.43	19.99	-0.3551
Hyper-SD	3	28.80	22.14	0.9882
SenseFlow	3	30.63	22.33	1.2030
Ours	3	32.15	22.46	1.0956
Flux	5	26.12	21.83	0.7443
Hyper-SD	5	27.83	22.08	1.0710
SenseFlow	5	30.99	22.53	1.2110
Ours	5	32.16	22.68	1.3337
Flux	8	28.43	22.42	0.9140
Hyper-SD	8	30.50	22.76	1.0410
SenseFlow	8	30.99	22.59	1.1720
Ours	8	33.81	23.14	1.3240
Flux	50	29.76	22.59	1.0037
Hyper-SD	50	30.01	22.51	0.9461
SenseFlow	50	30.69	22.31	1.0810
Ours	50	32.95	22.79	1.1544

Video Generation: VBench Overall Scores

The largest improvements appear in the low-step video generation regime.

Method	NFEs	Quality Score ↑	Semantic Score ↑	Total Score ↑
Wan	50	83.08	62.93	79.05
Ours	50	83.99 +0.91	65.99 +3.06	80.40 +1.35
Wan	8	77.82	48.74	72.01
Ours	8	82.66 +4.84	70.35 +21.61	80.20 +8.19
Wan	5	73.64	34.10	65.73
Ours	5	81.23 +7.59	67.78 +33.68	78.53 +12.80

Efficiency Comparison

RATS combines few-step generation and preference alignment with substantially lower total training time.

Method	Step Time (s)	Peak Memory (GB)	Per-Step Compute (TFLOPs)	Total Steps (K)	Total Time (h)	Extra Data	Few-Step	Preference Align
SenseFlow	7.31	78.87	801.28	12.0	24.35	Yes	Yes	No
DanceGRPO	212.71	34.05	1605.00	0.2	11.78	No	No	Yes
Ours	7.57	67.67	1229.60	0.4	0.83	No	Yes	Yes

Qualitative Results

Experiments show that RATS consistently improves the efficiency-quality frontier, narrowing the gap between few-step students and stronger multi-step generators for both image and video generation.

Main paper FLUX image generation case study — FLUX case study from the main paper.

Main paper WAN video generation case study — WAN case study from the main paper.

Reward Dynamics

Citation

@article{li2026rats,
  title   = {Reward-Aware Trajectory Shaping for Few-step Visual Generation},
  author  = {Li, Rui and Li, Bingyu and Liang, Yuanzhi and Huang, Haibin and Zhang, Chi and Li, XueLong},
  journal = {arXiv preprint arXiv:2604.14910},
  year    = {2026}
}