删掉跨分辨率对齐

2026-04-08 10:38:22 +08:00
parent c68f896029
commit 9377fc82cd
5 changed files with 83 additions and 105 deletions
--- a/MarsRAG/MarsRAG.tex
+++ b/MarsRAG/MarsRAG.tex
@@ -129,7 +129,6 @@ This section introduces the implementation approach of AreoRAG. As shown in Fig.
 The three modules interact through three explicit coupling points: (1) HySH's spatial alignment is a prerequisite for meaningful interaction entropy computation in PICT; (2) the radial depth difference $\Delta r$ from HySH directly feeds into the PICT feature vector as the resolution disparity signal; and (3) PICT's triage results feed back to boost retrieval priority of scientifically interesting regions in subsequent queries.

 \subsection{Hyperbolic Spatial Hypergraph Construction}\label{sec:HySH}
-
 The AreoRAG method begins by constructing a knowledge structure that can faithfully represent the continuous spatiotemporal topology of planetary multi-source data. Unlike MultiRAG's Multi-source Line Graph (MLG), which relies on discrete text entities and binary triples, we introduce a hypergraph structure embedded in hyperbolic space to jointly address edge explosion and spatial scale hierarchy.

 1) Multi-source Spatial Adapter Parsing: We first design a spatial adapter for each observation data source to parse instrument metadata, spatial footprints, temporal windows, and spectral parameters. For orbital remote sensing data (e.g., HiRISE, CTX, CRISM, MOLA), parsing involves extracting the image footprint geometry, ground sampling distance, and spectral band configuration from PDS labels. For derived data products (e.g., DTMs, mineral abundance maps), parsing extracts provenance links to the source observations and processing parameters. All temporal references are unified to Solar Longitude $L_s$ to enable cross-platform temporal comparison.
@@ -146,7 +145,7 @@ Through the parsed data $D_{Fusion}$, we further extract entities (geological fe

 \begin{equation}
 	\label{equ:planetary science domain schema}
-	\sum_{D_i} \left( \{e_1, e_2, \ldots, e_m\} \sqcup \{r_1, r_2, \ldots, r_n\} \sqcup \{f_{spa,1}^n, \ldots, f_{spa,p}^n\} \right).
+	\sum_{D_i}{\left( \{e_1,e_2,...,e_m\}\bigcup{\{r_1,r_2,...,r_n\}}\bigcup{\{f_{spa,1}^{n},...,f_{spa,p}^{n}\}} \right)}.
 \end{equation}

 2) Spatial Observation Hyperedge Formation: Based on the extracted knowledge base, we construct spatial observation hyperedges that bind co-located multi-source observations into single $n$-ary facts. As formalized in Definition 2, each hyperedge $f_{spa}^n$ encapsulates the instrument, spatial footprint, temporal window, spectral bands, target features, and resolution. In a pairwise binary graph, $k$ co-existing spatial entities require $\binom{k}{2} = O(k^2)$ spatial proximity edges. With hyperedges, a single $n$-ary fact binds all $k$ entities, reducing edge complexity to $O(k)$. This directly resolves the edge explosion problem identified in our analysis of MLG.
@@ -161,28 +160,19 @@ where $g(\ell_{res}) = -\log(\ell_{res} / \ell_{max})$ is a monotone decreasing

 This embedding design is motivated by the following observation on the intrinsic geometry of planetary spatial data:

-**Proposition 1** (Spatial Scale-Curvature Correspondence). *The planetary spatial observation hierarchy exhibits tree-like branching: each coarser-resolution observation spatially contains multiple finer-resolution observations. Let $N(\ell)$ denote the number of observations at resolution level $\ell$. For remote sensing data with total survey area $A_{coverage}$:
+Observation~1 (Spatial Scale-Curvature Correspondence). The planetary spatial observation hierarchy exhibits tree-like branching: each coarser-resolution observation spatially contains multiple finer-resolution observations. Let $N(\ell)$ denote the number of observations at resolution level $\ell$. For remote sensing data with total survey area $A_{coverage}$:

 \begin{equation}
 	\label{equ:Spatial Scale-Curvature Correspondence}
 	N(\ell) \propto A_{coverage} / \ell^2.
 \end{equation}

-As resolution $\ell$ decreases (finer scale), $N(\ell)$ grows quadratically, exhibiting the exponential branching characteristic of negative-curvature spaces. Therefore, the spatial scale hierarchy is intrinsically hyperbolic, and Euclidean embedding with polynomial volume growth cannot faithfully represent it.*
+As resolution $\ell$ decreases, $N(\ell)$ grows quadratically, exhibiting the exponential branching characteristic of negative-curvature spaces. Therefore, the spatial scale hierarchy is intrinsically hyperbolic, and Euclidean embedding with polynomial volume growth cannot faithfully represent it.

 Through this embedding, global coarse-resolution data (e.g., MOLA topography at ~460 m) is placed near the hyperbolic origin (small radial depth), while local high-resolution data (e.g., HiRISE at 0.3 m) is placed far from the origin (large radial depth). The exponential volume growth of $\mathbb{H}_K^d$ naturally accommodates the exponentially increasing number of observations at finer scales.

-4) Cross-Reference-Frame Alignment: Different orbital missions use slightly different coordinate reference frames. We align all observations to a unified global reference via parallel transport on the hyperbolic manifold:
-
-\begin{equation}
-	\label{equ:Cross-Reference-Frame Alignment}
-	\Phi_{aligned}(e) = \exp_{o_g}\left(\Gamma_{o_k \to o_g}\left(\log_{o_k}(\Phi_k(e))\right)\right),
-\end{equation}
-where $\log_{o_k}$ is the logarithmic map at the local reference origin $o_k$, $\Gamma_{o_k \to o_g}$ is the parallel transport operator along the geodesic from $o_k$ to the global origin $o_g$, and $\exp_{o_g}$ is the exponential map at the global origin. Unlike Euclidean affine transformations, hyperbolic parallel transport preserves geodesic distances and radial depth, ensuring that scale hierarchy information is maintained after cross-frame alignment.
-
 Here, we provide a simple example of hyperbolic spatial hypergraph construction. As shown in Fig. 4, an observation region is covered by three orbital sensors at different resolutions: a CTX mosaic (6 m), an HiRISE strip (0.3 m), and a CRISM spectral cube (18 m). In the HySH, the HiRISE observation (finest resolution) is embedded at the largest radial depth, while the CRISM observation (coarsest resolution) is nearest to the origin. A spatial observation hyperedge binds all three observations and their co-located geological features into a single $n$-ary fact, without requiring $O(k^2)$ pairwise edges.

-
 \subsection{Spatiotemporal Retrieval with Cross-Resolution Aggregation}\label{sec:retrieval}

 After the construction of the hyperbolic spatial hypergraph, the next step is to retrieve query-relevant multi-source spatial evidence. The retrieval process comprises two phases: spatiotemporal evidence extraction and cross-resolution aggregation.
@@ -292,7 +282,7 @@ A lightweight classifier maps the feature vector to conflict type:

 $$\hat{c} = \arg\max_{c \in \{noise, inst, scale, temp\}} P_\theta(c \mid \mathbf{z}_{conf})$$

-**Proposition 2** (Conflict Type Separability). *The four conflict types are distinguished by orthogonal physical dimensions: $\|\Omega_i - \Omega_j\|$ separates instrument conflicts; $|\log(\ell_{res}^i / \ell_{res}^j)|$ separates scale conflicts; $\Delta\mathcal{T}$ separates temporal conflicts; $\rho_{auth}$ separates noise conflicts. Since these physical features are independent of and complementary to the hidden state features $\mathbf{h}^{(l^*)}_{conf}$ (which encode semantic inconsistency), the four conflict types are linearly separable in the augmented feature space $\mathbf{z}_{conf}$.*
+Lemma~1 (Conflict Type Separability). The four conflict types are distinguished by orthogonal physical dimensions: $\|\Omega_i - \Omega_j\|$ separates instrument conflicts; $|\log(\ell_{res}^i / \ell_{res}^j)|$ separates scale conflicts; $\Delta\mathcal{T}$ separates temporal conflicts; $\rho_{auth}$ separates noise conflicts. Since these physical features are independent of and complementary to the hidden state features $\mathbf{h}^{(l^*)}_{conf}$ (which encode semantic inconsistency), the four conflict types are linearly separable in the augmented feature space $\mathbf{z}_{conf}$.

 3) Conflict-Aware Confidence Recalibration: Based on the classification result, we recalibrate the node confidence. This is the key departure from MultiRAG's MCC, which uniformly penalizes inconsistency:

@@ -579,7 +569,7 @@ A defining capability of AreoRAG is the ability to preserve scientifically valua

 Table IV reveals the fundamental difference between AreoRAG and existing methods. MultiRAG achieves a high Noise Rejection Rate (85.7\%) but at the cost of a catastrophically low Conflict Preservation Rate (8.3\%) — it filters 91.7\% of scientifically valuable conflicts as "unreliable data." TruthfulRAG and MetaRAG show similar behavior (CPR of 13.9\% and 11.1\%), confirming that existing conflict-resolution methods systematically destroy scientific anomaly signals.

-In contrast, AreoRAG achieves a CPR of 91.7\% while maintaining the same NRR (85.7\%) as MultiRAG, demonstrating that PICT successfully decouples noise filtering from scientific conflict preservation. The Conflict Classification Accuracy of 84.0\% on the four-category task validates the separability claim in Proposition 2. Error analysis reveals that the primary source of misclassification is between instrument-inherent and scale-dependent conflicts (12.3\% confusion rate), which is expected as both involve observation geometry differences. Noise vs. scientific conflict misclassification is rare (3.7\%), confirming the robustness of the explainable/opaque distinction (Definition 7).
+In contrast, AreoRAG achieves a CPR of 91.7\% while maintaining the same NRR (85.7\%) as MultiRAG, demonstrating that PICT successfully decouples noise filtering from scientific conflict preservation. The Conflict Classification Accuracy of 84.0\% on the four-category task validates the separability claim in Lemma~1. Error analysis reveals that the primary source of misclassification is between instrument-inherent and scale-dependent conflicts (12.3\% confusion rate), which is expected as both involve observation geometry differences. Noise vs. scientific conflict misclassification is rare (3.7\%), confirming the robustness of the explainable/opaque distinction (Definition 7).

 Furthermore, the F1 score improvement (53.1\% vs. 35.2\% for MultiRAG) demonstrates that preserving scientific conflicts directly benefits answer quality: the LLM can generate more comprehensive and scientifically faithful answers when provided with both agreeing and legitimately disagreeing evidence, accompanied by physical bridging explanations.

@@ -643,7 +633,7 @@ We acknowledge several limitations inherent in the current framework:

 4) **Generalization to other planetary bodies**: While designed for Mars, the framework's principles (hyperbolic scale hierarchy, physics-informed conflict triage) are applicable to other planetary bodies (Moon, Venus, icy moons). Validation on non-Mars datasets remains future work.

-\subsection{Related Work}
+\section{Related Work}

 \subsection{Graph-Structured Retrieval Augmented Generation}

@@ -662,7 +652,7 @@ Hyperbolic geometry has attracted increasing attention in representation learnin

 In the context of text retrieval, hyperbolic geometry has recently shown strong promise. HypRAG [20] introduces hyperbolic dense retrieval for RAG, developing two model variants in the Lorentz model: a fully hyperbolic transformer (HyTE-FH) and a hybrid architecture (HyTE-H). A key contribution is the Outward Einstein Midpoint (OEM), a geometry-aware pooling operator that provably preserves hierarchical structure during sequence aggregation, overcoming the radial contraction failure of naive Euclidean averaging. HypRAG achieves up to 29\% gains over Euclidean baselines in context relevance on RAGBench, and demonstrates that hyperbolic representations encode document specificity through norm-based separation — with over 20\% radial increase from general to specific concepts. HyperbolicRAG [58] projects embeddings into the Poincar\'e ball to encode hierarchical depth within a static knowledge graph, using dual-space retrieval that fuses Euclidean and hyperbolic rankings. HELM [59] introduces a family of hyperbolic language models that operate entirely in hyperbolic space for text generation, though not specifically targeting retrieval.

-These works establish the viability of hyperbolic geometry for hierarchical text retrieval, but they exclusively address the semantic hierarchy of natural language documents (broad topics → specific entities). No existing work has applied hyperbolic geometry to represent the physical scale hierarchy of scientific observations, where the hierarchy arises not from semantic abstraction but from spatial resolution (coarse global survey → fine local imaging). AreoRAG introduces the scale-curvature correspondence principle (Proposition 1), which establishes that the resolution hierarchy of planetary remote sensing data is intrinsically hyperbolic, and couples spatial resolution with radial depth in the Lorentz model. Furthermore, we extend the OEM pooling operator with resolution-aware radial weighting (Spatial OEM, Eq. 13), ensuring that cross-resolution aggregation preserves fine-scale observational details rather than collapsing them into coarse-resolution summaries.
+These works establish the viability of hyperbolic geometry for hierarchical text retrieval, but they exclusively address the semantic hierarchy of natural language documents (broad topics → specific entities). No existing work has applied hyperbolic geometry to represent the physical scale hierarchy of scientific observations, where the hierarchy arises not from semantic abstraction but from spatial resolution (coarse global survey → fine local imaging). AreoRAG introduces the scale-curvature correspondence principle (Observation 1), which establishes that the resolution hierarchy of planetary remote sensing data is intrinsically hyperbolic, and couples spatial resolution with radial depth in the Lorentz model. Furthermore, we extend the OEM pooling operator with resolution-aware radial weighting (Spatial OEM, Eq. 13), ensuring that cross-resolution aggregation preserves fine-scale observational details rather than collapsing them into coarse-resolution summaries.


 \subsection{Knowledge Conflict Detection and Resolution in RAG}