Balanced Conic Rectified Flow mitigates distribution drift in standard reflow by combining real-data inversions and conic noise interpolation, achieving straighter flows and better generation quality with significantly fewer fake samples.
Rectified flow is a generative model that learns smooth transport mappings between two distributions through an ordinary differential equation (ODE). The model learns a straight ODE by reflow steps which iteratively update the supervisory flow. It allows for a relatively simple and efficient generation of high-quality images. However, rectified flow still faces several challenges. 1) The reflow process is slow because it requires a large number of generated pairs to model the target distribution. 2) It is well known that the use of suboptimal fake samples in reflow can lead to performance degradation of the learned flow model. This issue is further exacerbated by error accumulation across reflow steps and model collapse in denoising autoencoder models caused by self-consuming training. In this work, we go one step further and empirically demonstrate that the reflow process causes the learned model to drift away from the target distribution, which in turn leads to a growing discrepancy in reconstruction error between fake and real images. We reveal the drift problem and design a new reflow step, namely the \textit{conic reflow}. It supervises the model by the inversions of real data points through the previously learned model and its interpolation with random initial points. Our conic reflow leads to multiple advantages. 1) It keeps the ODE paths toward real samples, evaluated by reconstruction. 2) We use only a small number of generated samples instead of large generated samples, 600K and 4M, respectively. 3) The learned model generates images with higher quality evaluated by FID, IS, and Recall. 4) The learned flow is more straight than others, evaluated by curvature. We achieve much lower FID in both one-step and full-step generation in CIFAR-10. The conic reflow generalizes to various datasets such as LSUN Bedroom and ImageNet.
In standard reflow, the model is trained only with its own generated pairs \((X_0^\text{fake}, X_1^\text{fake})\). As reflow steps accumulate, these synthetic pairs begin to dominate supervision, causing the model to gradually drift away from the real data distribution. Eventually, the learned trajectories are pulled toward regions populated by fake samples, diverging from the true target distribution.
Initially, fake samples roughly follow the true two-moon distribution. But as reflow progresses, they drift further away and eventually collapse into incorrect regions. This clearly visualizes the distribution drift inherent in standard reflow.
Quantitatively, the KL divergence between fake and real distributions increases at each reflow step. Once the flow starts drifting toward synthetic regions, repeated reflow only reinforces the deviation instead of correcting it.
Reconstruction and perturbed reconstruction errors reveal an additional imbalance: fake images are reconstructed with very small errors, while real images exhibit significantly larger errors, and this discrepancy widens as reflow progresses. This real–fake reconstruction gap indicates that the learned flow is becoming biased toward its own generations and drifting away from the real data manifold.
Our method, Balanced Conic Rectified Flow, augments the standard reflow procedure by introducing conic supervision around real data while still leveraging fake pairs from the original rectified flow. The key idea is to start from numerically inverted real samples and expand their influence to a local conic neighborhood in the noise space using spherical linear interpolation (Slerp), then balance this real-pair supervision with standard fake-pair reflow.
Given a real image \(x_1 \sim \pi_1\), we first compute its reverse noise (inverse initial point) via the current flow model:
$$Z_{0,R} = v_\theta^{-1}(x_1).$$
To cover a neighborhood of \(Z_{0,R}\) on the Gaussian hypersphere, we draw a random noise vector \(\epsilon \sim \mathcal{N}(0, I)\) and apply spherical linear interpolation (Slerp) with interpolation ratio \(\zeta \in [0, 1]\):
$$\operatorname{Slerp}(Z_{0,R}, \epsilon; \zeta) = \frac{\sin((1 - \zeta)\,\varphi)}{\sin \varphi}\, Z_{0,R} + \frac{\sin(\zeta \,\varphi)}{\sin \varphi}\, \epsilon,$$
$$\varphi = \arccos\big( Z_{0,R} \cdot \epsilon \big).$$
Using this Slerp interpolation, we define a conic inverse path from the real sample \(x_1\) for each time \(t \in [0, 1]\):
$$\text{Conic}(x_1, \epsilon, \zeta, t) = t\,x_1 + (1 - t)\,\operatorname{Slerp}(Z_{0,R}, \epsilon; \zeta),$$
so that the trajectory linearly interpolates between the perturbed inverse \(\operatorname{Slerp}(Z_{0,R}, \epsilon; \zeta)\) and the real sample \(x_1\). Over multiple samples of \(\epsilon\) and \(\zeta\), these paths form a cone around each real data point in noise space, hence the name conic reflow.
For a conic real pair, the ideal velocity is given by the difference between the endpoint and the (perturbed) starting point,
$$u_\text{real}(x_1, \epsilon, \zeta) = x_1 - \operatorname{Slerp}(Z_{0,R}, \epsilon; \zeta),$$
and the conic reflow objective minimizes the squared error between this target vector and the model velocity along the conic path:
$$\mathcal{L}_\text{real}(\theta) = \mathbb{E}_{x_1, \epsilon, \zeta, t} \Big[ \big\| x_1 - \operatorname{Slerp}(Z_{0,R}, \epsilon; \zeta) - v_\theta\big(\text{Conic}(x_1, \epsilon, \zeta, t),\, t\big) \big\|^2 \Big],$$
where \(t \sim \exp([0, 1])\), \(\zeta\) follows a dedicated Slerp schedule, and the time weighting function \(w_t\) is set to 1 by default. Slerp preserves vector norms on the Gaussian hypersphere and produces smooth semantic transitions, acting as a geometry-aware regularizer that improves the alignment between numerically inverted real samples and the true data manifold. This localized, perturbation-based supervision around real data is also consistent with adversarial robustness and reconstruction-based regularization strategies in inverse problems.
At the same time, we retain the original reflow on fake pairs to supervise the broader domain. Let \((Z_{0,F}, Z_{1,F})\) be a fake pair produced by the previous reflow step, and
$$Z_{t,F} = (1 - t)\,Z_{0,F} + t\,Z_{1,F}, \quad t \sim \exp([0, 1]).$$
The fake-pair loss is identical to the standard rectified flow objective:
$$\mathcal{L}_\text{fake}(\theta) = \mathbb{E}_{Z_{0,F}, Z_{1,F}, t} \Big[ \big\| Z_{1,F} - Z_{0,F} - v_\theta(Z_{t,F}, t) \big\|^2 \Big].$$
In practice, each mini-batch contains an index set \(\mathcal{U}_\text{real}\) for real conic pairs and \(\mathcal{U}_\text{fake}\) for fake pairs, with \(\mathcal{U}_\text{real} \cup \mathcal{U}_\text{fake} = \mathcal{N}\). We define indicator functions \(\chi_\text{real}(i) = 1\) if \(i \in \mathcal{U}_\text{real}\) and 0 otherwise, and similarly \(\chi_\text{fake}(i) = 1\) if \(i \in \mathcal{U}_\text{fake}\). Our full training objective can then be expressed as
$$\min_\theta \int_0^1 \mathbb{E} \Big[ \chi_\text{fake}\, \big\| \dot{Z}_{t,F} - v_\theta(Z_{t,F}, t) \big\|^2 + \chi_\text{real}\, \big\| x_1 - \operatorname{Slerp}(Z_{0,R}, \epsilon; \zeta) - v_\theta\big(\text{Conic}(x_1, \epsilon, \zeta, t), t\big) \big\|^2 \Big]\, dt.$$
Equivalently, this can be viewed as a balanced combination of fake-pair reflow and conic real-pair reflow, where the relative proportions of \(\mathcal{U}_\text{real}\) and \(\mathcal{U}_\text{fake}\) control the trade-off between trajectory straightening (via fake pairs) and real-data anchoring (via conic real pairs). This balanced conic rectified flow simultaneously: (i) mitigates distribution drift by centering supervision around real samples, (ii) maintains coverage of the full domain through fake pairs, and (iii) improves stability and continuity of the learned vector field in the neighborhood of the data manifold.
The perturbation strength in Slerp interpolation is governed by the ratio \( \zeta \in [0,1] \). However, applying excessively large perturbations may push the perturbed inverse far away from the region where the reverse noise remains consistent with the real data manifold. To avoid instability, we determine an upper bound \( \zeta_{\max} \) using perturbed reconstruction behaviors of real and fake samples.
As \( \zeta \) increases, real images experience a significantly larger reconstruction degradation compared to fake samples. When this discrepancy grows too rapidly, the perturbation no longer provides stable or meaningful supervision. Therefore, we select \( \zeta_{\max} \) as the largest perturbation level that keeps this behavior well-controlled, ensuring stable conic supervision around each real data point.
Instead of sampling the perturbation ratio \( \zeta \) uniformly, Balanced Conic Reflow uses a Slerp noise schedule that gradually reduces the perturbation level over the course of training. This design mirrors the intuition behind classical diffusion models: begin with broader noise to encourage exploration, and progressively reduce noise to refine alignment with the real data manifold.
For a single conic path, the perturbation schedule follows a smooth, decreasing curve that starts at \( \zeta_{\max} \) at the beginning of training \((t'=1)\) and approaches zero by the end \((t'=0)\):
\( \zeta(t') = \zeta_{\max} \cdot \frac{2 t'^2}{\,1 + t'^2\,}, \qquad t' \in [0, 1]. \)
This ensures that early training emphasizes robustness around real inversions using stronger noise, while later stages focus on precise geometric alignment near the true data manifold.
During training, we periodically refresh the real sample pairs used for conic supervision. When these refresh cycles occur multiple times (e.g., every 2K steps in a 220K-step run), we further apply a global noise pattern across the entire training trajectory.
\([K,\, K{-}1,\, \dots,\, 1,\, 2,\, \dots,\, K]\)
This pattern assigns the smallest noise level to the outermost stages of training and the largest perturbation to the midpoint. As a result, the global schedule forms a linearly increasing noise phase in the first half of training and a symmetrically decreasing phase in the second half.
Together, the single-conic schedule and the global noise pattern ensure that the model begins with wide-range exploration and transitions smoothly toward precise, real-data-focused refinement—resulting in a more stable and accurate learned flow.
A key goal of rectified flow is to learn straight ODE trajectories that connect noise and real images. To quantify how straight these trajectories are, we measure two complementary metrics: curvature and Initial Velocity Delta (IVD).
Curvature captures the overall deviation of the solution trajectory from a straight line. Lower curvature indicates that the learned flow produces smoother and more linear paths.
While curvature evaluates global path shape, it does not reveal how accurately the model predicts the initial direction of the trajectory. To evaluate one-step quality, we use Initial Velocity Delta (IVD), which compares the model’s predicted initial velocity with the ideal displacement toward the data sample.
In summary, curvature tells us how straight the full trajectory is, while IVD reveals whether the flow already moves in the correct direction at the very beginning. A good rectified flow model must achieve low values in both metrics to enable high-quality one-step and few-step sampling.
On CIFAR-10, Balanced Conic Rectified Flow consistently improves FID, Inception Score, and overall sample quality across all sampling regimes — one-step, few-step, and full-step generation. The method produces higher-quality samples at every step while requiring far fewer synthetic pairs.
Our approach also remains effective when applied to stronger baselines such as Rectified++, improving generation quality even when using substantially fewer fake pairs. This demonstrates that conic supervision with refreshed real-data inversions transfers well to different reflow-based models.
Precision and recall show that full-step sampling behaves similarly between both models, but one-step sampling reveals a clear difference: while precision remains nearly identical, Balanced Conic Reflow achieves noticeably higher recall. This indicates that the model covers the real data distribution more broadly without sacrificing accuracy.
Standard reflow gradually drifts away from real data, creating a widening reconstruction error gap between real and fake images. Balanced Conic Reflow mitigates this issue by refreshing real-pair supervision with Slerp perturbations, which progressively narrows the gap and stabilizes the learned velocity field around real images.
On the more complex and multimodal ImageNet 64×64 dataset, Balanced Conic Rectified Flow continues to improve generation quality, achieving lower FID and higher recall while maintaining strong Inception Scores. Even a moderate number of real-pair inversions effectively counteracts distribution drift and improves coverage of the true distribution.
The method expands recall while maintaining precision comparable to the baseline. This balanced improvement mirrors the behavior observed on CIFAR-10, confirming that conic real-pair supervision scales effectively to large-scale datasets.
Balanced Conic Reflow further reduces reconstruction and perturbed reconstruction errors, significantly shrinking the difference between real and fake samples. This demonstrates improved stability around the real data manifold and reduced drift during the reflow process.
Qualitative comparisons show that our method produces clearer and more stable reconstructions under perturbations. The inverted trajectories remain well aligned with the real data, confirming stronger robustness than the original reflow model.
Finally, we evaluate our method on LSUN Bedroom at a high resolution of 256×256. Balanced Conic Rectified Flow continues to outperform the baseline, producing sharper images with improved global structure and finer local details, while using substantially fewer synthetic pairs.
The method shows stronger few-step generation quality and maintains competitive performance under adaptive-step solvers, demonstrating that the benefits of conic real-pair supervision extend to high-resolution and complex indoor scenes.
These results show that Balanced Conic Rectified Flow scales reliably to larger resolutions while maintaining its signature advantages: better sample quality, improved coverage, and strong robustness — all with significantly fewer synthetic pairs than traditional reflow pipelines.
Beyond sample quality, Balanced Conic Rectified Flow improves the trajectory straightness of the learned transport. We track both curvature and Initial Velocity Delta (IVD) to assess how smoothly and accurately the flow connects noise and data.
The visualization below shows how curvature and IVD evolve during training. Our method consistently achieves lower values than the original rectified flow, indicating straighter and more stable trajectories even with fewer fake pairs. The right-side trend further shows that adding an extra reflow step (from 2-rectified to 3-rectified flow) continues to reduce curvature and IVD, reinforcing the effectiveness of our training procedure.
Overall, these results show that Balanced Conic Rectified Flow not only improves FID and recall, but also shapes the underlying vector field into a straighter, more reliable transport map with better-preserved initial velocity directions.
Balanced Conic Rectified Flow can also be applied as a fine-tuning method for existing rectified flow models. Rather than re-training from scratch, we refine a pretrained model using a small number of real-pair conic reflow updates. In our experiments, we fine-tune the official CIFAR-10 rectified flow checkpoints provided by the original authors, using only 60,000 real pairs.
Even with such a small amount of additional supervision, fine-tuning noticeably improves 1-step generation quality, as shown in Figure (a). The model becomes more aligned with the real data manifold and avoids the synthetic bias accumulated during standard reflow.
The impact of fine-tuning extends beyond image quality:
These results demonstrate that our real-pair conic refinement serves as a practical and lightweight upgrade to existing rectified flow models, improving both generation quality and trajectory straightness with minimal additional cost.
We analyze how each component of Balanced Conic Rectified Flow contributes to its stability, geometry, and generation performance. Our ablations focus on the role of Slerp-based conic perturbations and the interaction between real pairs and fake pairs during reflow.
We compare three perturbation strategies around the inverse noise: linear interpolation, Gaussian perturbations, and our Slerp-based conic noise. Linear interpolation alters the norm and leaves the hypersphere, while naive Gaussian noise ignores the angular structure between directions. In contrast, Slerp preserves the norm and smoothly interpolates angles, producing consistent and geometry-aware conic perturbations. Empirically, the Slerp pattern yields lower curvature, lower IVD, and a smaller reconstruction gap between real and fake samples.
We also examine how the flow behaves when trained with different combinations of real and fake pairs. Real-only supervision reduces drift toward synthetic samples but can overfit to the limited inverse dataset. Mixing real and fake pairs improves diversity, yet without structured perturbations the geometry remains unstable.
The full configuration—real pairs + fake pairs + Slerp-based perturbations— produces the most stable and effective flow: strong anchoring to real data, broad coverage from fake pairs, and consistent geometric regularization. This combination achieves the best generation quality, highest recall, and the most stable curvature and IVD across datasets.
Overall, these ablations confirm that each component of Balanced Conic Rectified Flow plays a distinct and essential role in mitigating distribution drift and producing high-quality, straight trajectories.