Balanced Conic Rectified Flow

Shin Seong Kim, Mingi Kwon, Jaeseok Jeong, Youngjung Uh,
Yonsei University

Balanced Conic Rectified Flow mitigates distribution drift in standard reflow by combining real-data inversions and conic noise interpolation, achieving straighter flows and better generation quality with significantly fewer fake samples.

Abstract


Rectified flow is a generative model that learns smooth transport mappings between two distributions through an ordinary differential equation (ODE). The model learns a straight ODE by reflow steps which iteratively update the supervisory flow. It allows for a relatively simple and efficient generation of high-quality images. However, rectified flow still faces several challenges. 1) The reflow process is slow because it requires a large number of generated pairs to model the target distribution. 2) It is well known that the use of suboptimal fake samples in reflow can lead to performance degradation of the learned flow model. This issue is further exacerbated by error accumulation across reflow steps and model collapse in denoising autoencoder models caused by self-consuming training. In this work, we go one step further and empirically demonstrate that the reflow process causes the learned model to drift away from the target distribution, which in turn leads to a growing discrepancy in reconstruction error between fake and real images. We reveal the drift problem and design a new reflow step, namely the \textit{conic reflow}. It supervises the model by the inversions of real data points through the previously learned model and its interpolation with random initial points. Our conic reflow leads to multiple advantages. 1) It keeps the ODE paths toward real samples, evaluated by reconstruction. 2) We use only a small number of generated samples instead of large generated samples, 600K and 4M, respectively. 3) The learned model generates images with higher quality evaluated by FID, IS, and Recall. 4) The learned flow is more straight than others, evaluated by curvature. We achieve much lower FID in both one-step and full-step generation in CIFAR-10. The conic reflow generalizes to various datasets such as LSUN Bedroom and ImageNet.

Motivation


Standard Reflow Causes Distribution Drift

High-level illustration of distribution drift

In standard reflow, the model is trained only with its own generated pairs \((X_0^\text{fake}, X_1^\text{fake})\). As reflow steps accumulate, these synthetic pairs begin to dominate supervision, causing the model to gradually drift away from the real data distribution. Eventually, the learned trajectories are pulled toward regions populated by fake samples, diverging from the true target distribution.

Two-moons toy dataset comparison
Two-moons experiment: real data (blue) vs. fake samples (yellow) across reflow steps.

Initially, fake samples roughly follow the true two-moon distribution. But as reflow progresses, they drift further away and eventually collapse into incorrect regions. This clearly visualizes the distribution drift inherent in standard reflow.

KL Divergence and Reconstruction Error Gap

KL divergence over reflow steps
KL divergence increases steadily as fake samples drift further from the real distribution.

Quantitatively, the KL divergence between fake and real distributions increases at each reflow step. Once the flow starts drifting toward synthetic regions, repeated reflow only reinforces the deviation instead of correcting it.

Reconstruction Error Gap between Real and Fake Samples

Reconstruction error gap visualization

Reconstruction and perturbed reconstruction errors reveal an additional imbalance: fake images are reconstructed with very small errors, while real images exhibit significantly larger errors, and this discrepancy widens as reflow progresses. This real–fake reconstruction gap indicates that the learned flow is becoming biased toward its own generations and drifting away from the real data manifold.

Proposed Method


Balanced Conic Rectified Flow (Ours)

Conic reflow illustration

Our method, Balanced Conic Rectified Flow, augments the standard reflow procedure by introducing conic supervision around real data while still leveraging fake pairs from the original rectified flow. The key idea is to start from numerically inverted real samples and expand their influence to a local conic neighborhood in the noise space using spherical linear interpolation (Slerp), then balance this real-pair supervision with standard fake-pair reflow.

Given a real image \(x_1 \sim \pi_1\), we first compute its reverse noise (inverse initial point) via the current flow model:

$$Z_{0,R} = v_\theta^{-1}(x_1).$$

To cover a neighborhood of \(Z_{0,R}\) on the Gaussian hypersphere, we draw a random noise vector \(\epsilon \sim \mathcal{N}(0, I)\) and apply spherical linear interpolation (Slerp) with interpolation ratio \(\zeta \in [0, 1]\):

$$\operatorname{Slerp}(Z_{0,R}, \epsilon; \zeta) = \frac{\sin((1 - \zeta)\,\varphi)}{\sin \varphi}\, Z_{0,R} + \frac{\sin(\zeta \,\varphi)}{\sin \varphi}\, \epsilon,$$

$$\varphi = \arccos\big( Z_{0,R} \cdot \epsilon \big).$$

Using this Slerp interpolation, we define a conic inverse path from the real sample \(x_1\) for each time \(t \in [0, 1]\):

$$\text{Conic}(x_1, \epsilon, \zeta, t) = t\,x_1 + (1 - t)\,\operatorname{Slerp}(Z_{0,R}, \epsilon; \zeta),$$

so that the trajectory linearly interpolates between the perturbed inverse \(\operatorname{Slerp}(Z_{0,R}, \epsilon; \zeta)\) and the real sample \(x_1\). Over multiple samples of \(\epsilon\) and \(\zeta\), these paths form a cone around each real data point in noise space, hence the name conic reflow.

For a conic real pair, the ideal velocity is given by the difference between the endpoint and the (perturbed) starting point,

$$u_\text{real}(x_1, \epsilon, \zeta) = x_1 - \operatorname{Slerp}(Z_{0,R}, \epsilon; \zeta),$$

and the conic reflow objective minimizes the squared error between this target vector and the model velocity along the conic path:

$$\mathcal{L}_\text{real}(\theta) = \mathbb{E}_{x_1, \epsilon, \zeta, t} \Big[ \big\| x_1 - \operatorname{Slerp}(Z_{0,R}, \epsilon; \zeta) - v_\theta\big(\text{Conic}(x_1, \epsilon, \zeta, t),\, t\big) \big\|^2 \Big],$$

where \(t \sim \exp([0, 1])\), \(\zeta\) follows a dedicated Slerp schedule, and the time weighting function \(w_t\) is set to 1 by default. Slerp preserves vector norms on the Gaussian hypersphere and produces smooth semantic transitions, acting as a geometry-aware regularizer that improves the alignment between numerically inverted real samples and the true data manifold. This localized, perturbation-based supervision around real data is also consistent with adversarial robustness and reconstruction-based regularization strategies in inverse problems.

At the same time, we retain the original reflow on fake pairs to supervise the broader domain. Let \((Z_{0,F}, Z_{1,F})\) be a fake pair produced by the previous reflow step, and

$$Z_{t,F} = (1 - t)\,Z_{0,F} + t\,Z_{1,F}, \quad t \sim \exp([0, 1]).$$

The fake-pair loss is identical to the standard rectified flow objective:

$$\mathcal{L}_\text{fake}(\theta) = \mathbb{E}_{Z_{0,F}, Z_{1,F}, t} \Big[ \big\| Z_{1,F} - Z_{0,F} - v_\theta(Z_{t,F}, t) \big\|^2 \Big].$$

In practice, each mini-batch contains an index set \(\mathcal{U}_\text{real}\) for real conic pairs and \(\mathcal{U}_\text{fake}\) for fake pairs, with \(\mathcal{U}_\text{real} \cup \mathcal{U}_\text{fake} = \mathcal{N}\). We define indicator functions \(\chi_\text{real}(i) = 1\) if \(i \in \mathcal{U}_\text{real}\) and 0 otherwise, and similarly \(\chi_\text{fake}(i) = 1\) if \(i \in \mathcal{U}_\text{fake}\). Our full training objective can then be expressed as

$$\min_\theta \int_0^1 \mathbb{E} \Big[ \chi_\text{fake}\, \big\| \dot{Z}_{t,F} - v_\theta(Z_{t,F}, t) \big\|^2 + \chi_\text{real}\, \big\| x_1 - \operatorname{Slerp}(Z_{0,R}, \epsilon; \zeta) - v_\theta\big(\text{Conic}(x_1, \epsilon, \zeta, t), t\big) \big\|^2 \Big]\, dt.$$

Equivalently, this can be viewed as a balanced combination of fake-pair reflow and conic real-pair reflow, where the relative proportions of \(\mathcal{U}_\text{real}\) and \(\mathcal{U}_\text{fake}\) control the trade-off between trajectory straightening (via fake pairs) and real-data anchoring (via conic real pairs). This balanced conic rectified flow simultaneously: (i) mitigates distribution drift by centering supervision around real samples, (ii) maintains coverage of the full domain through fake pairs, and (iii) improves stability and continuity of the learned vector field in the neighborhood of the data manifold.

Maximum Perturbation Magnitude and Noise Scheduling


Maximum Perturbation Magnitude \( \zeta_{\max} \)

The perturbation strength in Slerp interpolation is governed by the ratio \( \zeta \in [0,1] \). However, applying excessively large perturbations may push the perturbed inverse far away from the region where the reverse noise remains consistent with the real data manifold. To avoid instability, we determine an upper bound \( \zeta_{\max} \) using perturbed reconstruction behaviors of real and fake samples.

As \( \zeta \) increases, real images experience a significantly larger reconstruction degradation compared to fake samples. When this discrepancy grows too rapidly, the perturbation no longer provides stable or meaningful supervision. Therefore, we select \( \zeta_{\max} \) as the largest perturbation level that keeps this behavior well-controlled, ensuring stable conic supervision around each real data point.

Determining zeta max from perturbed reconstruction behavior
Determining \( \zeta_{\max} \) by monitoring how perturbed reconstruction errors change for real and fake samples.

Noise Scheduling for Conic Reflow

Instead of sampling the perturbation ratio \( \zeta \) uniformly, Balanced Conic Reflow uses a Slerp noise schedule that gradually reduces the perturbation level over the course of training. This design mirrors the intuition behind classical diffusion models: begin with broader noise to encourage exploration, and progressively reduce noise to refine alignment with the real data manifold.

For a single conic path, the perturbation schedule follows a smooth, decreasing curve that starts at \( \zeta_{\max} \) at the beginning of training \((t'=1)\) and approaches zero by the end \((t'=0)\):

\( \zeta(t') = \zeta_{\max} \cdot \frac{2 t'^2}{\,1 + t'^2\,}, \qquad t' \in [0, 1]. \)

This ensures that early training emphasizes robustness around real inversions using stronger noise, while later stages focus on precise geometric alignment near the true data manifold.

Slerp noise schedules for conic reflow
(a) Slerp noise schedule for a single conic reflow. (b) Total Slerp scheduling when periodically refreshing real sample pairs during training.

During training, we periodically refresh the real sample pairs used for conic supervision. When these refresh cycles occur multiple times (e.g., every 2K steps in a 220K-step run), we further apply a global noise pattern across the entire training trajectory.

\([K,\, K{-}1,\, \dots,\, 1,\, 2,\, \dots,\, K]\)

This pattern assigns the smallest noise level to the outermost stages of training and the largest perturbation to the midpoint. As a result, the global schedule forms a linearly increasing noise phase in the first half of training and a symmetrically decreasing phase in the second half.

Together, the single-conic schedule and the global noise pattern ensure that the model begins with wide-range exploration and transitions smoothly toward precise, real-data-focused refinement—resulting in a more stable and accurate learned flow.

Curvature and Initial Velocity Delta (IVD)


Measuring Trajectory Straightness

A key goal of rectified flow is to learn straight ODE trajectories that connect noise and real images. To quantify how straight these trajectories are, we measure two complementary metrics: curvature and Initial Velocity Delta (IVD).

Curvature captures the overall deviation of the solution trajectory from a straight line. Lower curvature indicates that the learned flow produces smoother and more linear paths.

Curvature equation

While curvature evaluates global path shape, it does not reveal how accurately the model predicts the initial direction of the trajectory. To evaluate one-step quality, we use Initial Velocity Delta (IVD), which compares the model’s predicted initial velocity with the ideal displacement toward the data sample.

IVD equation

In summary, curvature tells us how straight the full trajectory is, while IVD reveals whether the flow already moves in the correct direction at the very beginning. A good rectified flow model must achieve low values in both metrics to enable high-quality one-step and few-step sampling.

Generation Quality


CIFAR-10


On CIFAR-10, Balanced Conic Rectified Flow consistently improves FID, Inception Score, and overall sample quality across all sampling regimes — one-step, few-step, and full-step generation. The method produces higher-quality samples at every step while requiring far fewer synthetic pairs.

Our approach also remains effective when applied to stronger baselines such as Rectified++, improving generation quality even when using substantially fewer fake pairs. This demonstrates that conic supervision with refreshed real-data inversions transfers well to different reflow-based models.

CIFAR-10 quantitative
CIFAR-10 step-wise quality

Precision & Recall

Precision and recall show that full-step sampling behaves similarly between both models, but one-step sampling reveals a clear difference: while precision remains nearly identical, Balanced Conic Reflow achieves noticeably higher recall. This indicates that the model covers the real data distribution more broadly without sacrificing accuracy.

CIFAR-10 precision recall

Reconstruction & Perturbed Reconstruction

Standard reflow gradually drifts away from real data, creating a widening reconstruction error gap between real and fake images. Balanced Conic Reflow mitigates this issue by refreshing real-pair supervision with Slerp perturbations, which progressively narrows the gap and stabilizes the learned velocity field around real images.

CIFAR-10 reconstruction & perturbed reconstruction
  • Reduced reconstruction gap, preventing overfitting to synthetic samples.
  • Lower perturbed reconstruction error, indicating more robust local geometry around real data.

ImageNet 64×64


On the more complex and multimodal ImageNet 64×64 dataset, Balanced Conic Rectified Flow continues to improve generation quality, achieving lower FID and higher recall while maintaining strong Inception Scores. Even a moderate number of real-pair inversions effectively counteracts distribution drift and improves coverage of the true distribution.

ImageNet results

Precision & Recall

The method expands recall while maintaining precision comparable to the baseline. This balanced improvement mirrors the behavior observed on CIFAR-10, confirming that conic real-pair supervision scales effectively to large-scale datasets.

ImageNet precision recall

Reconstruction & Perturbed Reconstruction

Balanced Conic Reflow further reduces reconstruction and perturbed reconstruction errors, significantly shrinking the difference between real and fake samples. This demonstrates improved stability around the real data manifold and reduced drift during the reflow process.

ImageNet recon errors

Qualitative Reconstruction Robustness

Qualitative comparisons show that our method produces clearer and more stable reconstructions under perturbations. The inverted trajectories remain well aligned with the real data, confirming stronger robustness than the original reflow model.

ImageNet recon comparison

LSUN Bedroom 256×256


Finally, we evaluate our method on LSUN Bedroom at a high resolution of 256×256. Balanced Conic Rectified Flow continues to outperform the baseline, producing sharper images with improved global structure and finer local details, while using substantially fewer synthetic pairs.

The method shows stronger few-step generation quality and maintains competitive performance under adaptive-step solvers, demonstrating that the benefits of conic real-pair supervision extend to high-resolution and complex indoor scenes.

LSUN quality
LSUN metrics

These results show that Balanced Conic Rectified Flow scales reliably to larger resolutions while maintaining its signature advantages: better sample quality, improved coverage, and strong robustness — all with significantly fewer synthetic pairs than traditional reflow pipelines.

Straighter Flows: Curvature and IVD


Beyond sample quality, Balanced Conic Rectified Flow improves the trajectory straightness of the learned transport. We track both curvature and Initial Velocity Delta (IVD) to assess how smoothly and accurately the flow connects noise and data.

The visualization below shows how curvature and IVD evolve during training. Our method consistently achieves lower values than the original rectified flow, indicating straighter and more stable trajectories even with fewer fake pairs. The right-side trend further shows that adding an extra reflow step (from 2-rectified to 3-rectified flow) continues to reduce curvature and IVD, reinforcing the effectiveness of our training procedure.

Curvature and IVD comparisons

Overall, these results show that Balanced Conic Rectified Flow not only improves FID and recall, but also shapes the underlying vector field into a straighter, more reliable transport map with better-preserved initial velocity directions.

Fine-Tuning with Real Pairs


Balanced Conic Rectified Flow can also be applied as a fine-tuning method for existing rectified flow models. Rather than re-training from scratch, we refine a pretrained model using a small number of real-pair conic reflow updates. In our experiments, we fine-tune the official CIFAR-10 rectified flow checkpoints provided by the original authors, using only 60,000 real pairs.

Fine-tuning quality comparison (image quality)
(a) Image quality comparison between the original RF model and our fine-tuned model.

Even with such a small amount of additional supervision, fine-tuning noticeably improves 1-step generation quality, as shown in Figure (a). The model becomes more aligned with the real data manifold and avoids the synthetic bias accumulated during standard reflow.

Fine-tuning effect on curvature, IVD, recon and p-recon gaps
(b) Fine-tuned models (2-RF and 3-RF) show lower curvature, lower IVD, and reduced reconstruction gaps compared to the original models.

The impact of fine-tuning extends beyond image quality:

  • Straighter trajectories: Both curvature and IVD decrease sharply, indicating that the fine-tuned flow is more consistent and smooth.
  • Reduced bias toward fake samples: Reconstruction and perturbed-reconstruction errors for real and fake images become more similar, showing that the model no longer overfits to self-generated samples.
  • Effective even for deeper reflows: Improvements are consistent for both 2-RF and 3-RF pretrained models.

These results demonstrate that our real-pair conic refinement serves as a practical and lightweight upgrade to existing rectified flow models, improving both generation quality and trajectory straightness with minimal additional cost.

Ablation Studies


We analyze how each component of Balanced Conic Rectified Flow contributes to its stability, geometry, and generation performance. Our ablations focus on the role of Slerp-based conic perturbations and the interaction between real pairs and fake pairs during reflow.

Slerp noise pattern ablation

We compare three perturbation strategies around the inverse noise: linear interpolation, Gaussian perturbations, and our Slerp-based conic noise. Linear interpolation alters the norm and leaves the hypersphere, while naive Gaussian noise ignores the angular structure between directions. In contrast, Slerp preserves the norm and smoothly interpolates angles, producing consistent and geometry-aware conic perturbations. Empirically, the Slerp pattern yields lower curvature, lower IVD, and a smaller reconstruction gap between real and fake samples.

Real vs Real+Fake vs Real+Fake+Slerp ablation

We also examine how the flow behaves when trained with different combinations of real and fake pairs. Real-only supervision reduces drift toward synthetic samples but can overfit to the limited inverse dataset. Mixing real and fake pairs improves diversity, yet without structured perturbations the geometry remains unstable.

The full configuration—real pairs + fake pairs + Slerp-based perturbations— produces the most stable and effective flow: strong anchoring to real data, broad coverage from fake pairs, and consistent geometric regularization. This combination achieves the best generation quality, highest recall, and the most stable curvature and IVD across datasets.

Overall, these ablations confirm that each component of Balanced Conic Rectified Flow plays a distinct and essential role in mitigating distribution drift and producing high-quality, straight trajectories.