22 KiB
III. METHODOLOGY
A. Framework of XXX-RAG
This section elaborates on the implementation approach of XXX-RAG. As shown in Fig. 3, the framework comprises three modules. The first step involves constructing a Hyperbolic Spatial Hypergraph (HySH) from multi-source planetary observation data, achieving unified spatiotemporal representation via n-ary observation hyperedges embedded in hyperbolic space; the second step requires performing spatiotemporal retrieval on the constructed HySH, where hyperbolic spatial proximity encoding and cross-resolution aggregation are employed to extract query-relevant multi-source evidence; the third step involves physics-informed conflict triage (PICT), which detects inter-source conflicts via cross-source interaction entropy, classifies them into four scientific categories, and applies conflict-aware confidence recalibration to preserve scientifically valuable disagreements while filtering noise. Finally, integrating the aforementioned steps to form the XXX-RAG Prompting algorithm, ARP.
B. Hyperbolic Spatial Hypergraph Construction
The XXX-RAG method begins by constructing a knowledge structure that can faithfully represent the continuous spatiotemporal topology of planetary multi-source data. Unlike MultiRAG's Multi-source Line Graph (MLG), which relies on discrete text entities and binary triples, we introduce a hypergraph structure embedded in hyperbolic space to jointly address edge explosion and spatial scale hierarchy.
Specifically, we first design a spatial adapter for each observation data source to parse instrument metadata, spatial footprints, temporal windows, and spectral parameters. For orbital remote sensing data (e.g., HiRISE, CTX, CRISM), parsing involves extracting the image footprint geometry, ground sampling distance, and spectral band configuration from PDS labels. For in-situ data (e.g., rover spectrometers, ground-penetrating radar), parsing extracts the rover traverse coordinates, measurement timestamps in Sol, and instrument-specific parameters such as penetration depth. All temporal references are unified to Solar Longitude L_s to enable cross-platform temporal comparison. The final integration of multi-source spatial data can be expressed as:
D_{Fusion} = \bigcup_{i=1}^{n} A_i^{spa}(D_i) \tag{2}
where A_i^{spa} \in \{Ada_{orbital}, Ada_{insitu}, Ada_{derived}\} represents the spatial adapter parsing functions for orbital, in-situ, and derived data products respectively.
Through the parsed data, we further construct the hyperbolic spatial hypergraph. The construction process involves three key phases: spatial observation hyperedge formation, scale-aware hyperbolic embedding, and cross-reference-frame alignment.
Definition 1. Spatial observation hyperedge. Given a multi-source spatial knowledge hypergraph \mathcal{G}_{hyp} = (\mathcal{E}, \mathcal{R}, \mathcal{F}_{spa}), a spatial observation hyperedge f_{spa}^n \in \mathcal{F}_{spa} is defined as:
f_{spa}^n = (\mathcal{I}, \; \mathcal{P}_{foot}, \; \mathcal{T}_{win}, \; \mathcal{S}_{band}, \; \mathcal{O}_{target}, \; \ell_{res}) \tag{3}
where \mathcal{I} denotes the instrument entity, \mathcal{P}_{foot} \subset \mathbb{S}^2_{Mars} denotes the spatial footprint on the Martian sphere, \mathcal{T}_{win} denotes the temporal acquisition window parameterized in L_s, \mathcal{S}_{band} denotes the spectral band set, \mathcal{O}_{target} denotes the set of target geological features, and \ell_{res} \in \mathbb{R}^+ denotes the ground sampling distance.
Based on the definition, it can be inferred that spatial observation hyperedges achieve high aggregation of co-located multi-source observations. In a pairwise spatial graph, k co-existing spatial entities require \binom{k}{2} = O(k^2) spatial proximity edges. With hyperedges, a single $n$-ary fact binds all k entities, reducing edge complexity to O(k). This directly resolves the edge explosion problem identified in our analysis of MultiRAG's MLG structure.
Definition 2. Scale-aware Lorentz embedding. We represent the spatial observation hypergraph in $d$-dimensional hyperbolic space \mathbb{H}_K^d with constant negative curvature K < 0 using the Lorentz model. The embedding mapping \Phi: \mathcal{F}_{spa} \rightarrow \mathbb{H}_K^d couples the radial depth with spatial resolution:
r\left(\Phi(f_{spa}^n)\right) = \frac{1}{\sqrt{-K}} \cosh\left(\sqrt{-K} \cdot g(\ell_{res})\right) \tag{4}
where g(\ell_{res}) = -\log(\ell_{res} / \ell_{max}) is a monotone decreasing function of resolution, and r(\mathbf{x}) = x_0 denotes the radial depth (intrinsic hyperbolic distance from the origin).
This embedding design is motivated by the following observation on the intrinsic geometry of planetary spatial data:
Proposition 1 (Spatial Scale-Curvature Correspondence). The planetary spatial observation hierarchy exhibits tree-like branching: each coarser-resolution observation spatially contains multiple finer-resolution observations. Let N(\ell) denote the number of observations at resolution level \ell. For remote sensing data with total survey area A_{coverage}:
N(\ell) \propto A_{coverage} / \ell^2 \tag{5}
As resolution \ell decreases (finer scale), N(\ell) grows quadratically, exhibiting the exponential branching characteristic of negative-curvature spaces. Therefore, the spatial scale hierarchy is intrinsically hyperbolic, and Euclidean embedding with polynomial volume growth cannot faithfully represent it.
Through this embedding, global coarse-resolution data (e.g., MOLA topography at ~460m) is placed near the hyperbolic origin (small radial depth), while local high-resolution data (e.g., HiRISE at 0.3m) is placed far from the origin (large radial depth). The exponential volume growth of \mathbb{H}_K^d naturally accommodates the exponentially increasing number of observations at finer scales.
Finally, to address the heterogeneous reference frame problem (orbiter areocentric coordinates vs. rover-centric local coordinates), we align all observations to a global reference via parallel transport on the hyperbolic manifold:
\Phi_{aligned}(e) = \exp_{o_{g}}\left(\Gamma_{o_k \to o_{g}}\left(\log_{o_k}(\Phi_k(e))\right)\right) \tag{6}
where \log_{o_k} is the logarithmic map at the local reference origin o_k, \Gamma_{o_k \to o_{g}} is the parallel transport operator along the geodesic from o_k to the global origin o_g, and \exp_{o_g} is the exponential map at the global origin. Unlike Euclidean affine transformations, hyperbolic parallel transport preserves geodesic distances and radial depth, ensuring that scale hierarchy information is maintained after cross-frame alignment.
Here, we provide a simple example. As shown in Fig. 4, an observation region is covered by three sources at different resolutions: a CTX mosaic (6m), an HiRISE strip (0.3m), and a CRISM spectral cube (18m). In the hyperbolic spatial hypergraph, the HiRISE observation (finest resolution) is embedded at the largest radial depth, while the CRISM observation (coarsest resolution) is nearest to the origin. A spatial observation hyperedge binds all three observations and their co-located geological features into a single $n$-ary fact, without requiring O(k^2) pairwise edges.
C. Spatiotemporal Retrieval with Cross-Resolution Aggregation
After the construction of the hyperbolic spatial hypergraph, the next step is to retrieve query-relevant multi-source spatial evidence. Given a user query q, we first employ the LLM to extract spatial intent, including target entities, spatial constraints (footprint, region), temporal constraints (L_s range, Sol range), and resolution preferences. These are denoted as query elements \mathcal{K}_q.
Subsequently, we perform spatiotemporal retrieval on the hypergraph. For each topic entity e_s \in \mathcal{E}_q extracted from the query, we retrieve its incident spatial observation hyperedges \mathcal{F}_{e_s} = \{f_{spa}^n \in \mathcal{F}_{spa} : e_s \in f_{spa}^n\} and derive pseudo-binary triples (e_h, f_{spa}^n, e_t) for pairwise reasoning. For each candidate triple, we compute a spatiotemporal encoding that fuses semantic, structural, and physical-spatial signals:
\mathbf{x} = \left[\varphi(q) \| \varphi(e_h) \| \varphi(f_{spa}^n) \| \varphi(e_t) \| \delta(e_h, f_{spa}^n, e_t) \| \psi_{geo}(e_h, e_t)\right] \tag{7}
where \varphi denotes a text embedding model, \delta denotes a structural proximity encoding adapted from [6] to operate on hyperedges, and \psi_{geo} is the hyperbolic spatial encoding defined as:
\psi_{geo}(e_h, e_t) = \left[d_K\left(\Phi(e_h), \Phi(e_t)\right), \; \Delta r(e_h, e_t), \; \cos\theta_{bearing}\right] \tag{8}
Here d_K is the geodesic distance in \mathbb{H}_K^d capturing physical proximity, \Delta r = |r(\Phi(e_h)) - r(\Phi(e_t))| encodes scale difference via radial depth gap, and \cos\theta_{bearing} encodes directional relationship. A lightweight MLP classifier f_\theta then scores the plausibility of each candidate triple:
\text{score}(e_h, f_{spa}^n, e_t) = f_\theta(\mathbf{x}) \in [0, 1] \tag{9}
Top-scored triples are retained and their tail entities form the frontier for next-hop expansion, following an adaptive search strategy with density-aware thresholding.
After retrieval, the selected multi-source evidence typically spans multiple resolutions. To aggregate these into a unified representation without losing fine-scale information, we introduce the Spatial Outward Einstein Midpoint (Spatial OEM). The motivation stems from a known failure mode: naively averaging hyperbolic embeddings collapses representations toward the origin, destroying the hierarchical structure encoded in radial depth [7].
Given spatial observation hyperedge embeddings \{\Phi(f_i)\}_{i=1}^n \subset \mathbb{H}_K^d with query-relevance weights w_i and resolution-aware radial weighting \phi_{res}(f_i) = r(\Phi(f_i))^p:
\mathbf{m}_{K,p}^{Spa\text{-}OEM} = \Pi_K\left(\frac{\sum_{i=1}^{n} w_i \cdot \phi_{res}(f_i) \cdot \lambda_i \cdot \Phi(f_i)}{\sum_{i=1}^{n} w_i \cdot \phi_{res}(f_i) \cdot \lambda_i}\right) \tag{10}
where \lambda_i = \Phi(f_i)_0 is the Lorentz factor and \Pi_K denotes reprojection onto \mathbb{H}_K^d.
Theorem 1 (Spatial OEM Outward Bias). For p \geq 1, the Spatial OEM satisfies:
r(\mathbf{m}_{K,p}^{Spa\text{-}OEM}) \geq r(\mathbf{m}_K^{Ein})
where \mathbf{m}_K^{Ein} is the standard Einstein midpoint (p = 0).
Proof. The OEM weights \tilde{w}_i \propto w_i \cdot r(\Phi(f_i))^{p+1} concentrate more mass on high-radius points than the Einstein weights w_i \cdot r(\Phi(f_i)). By the Chebyshev sum inequality applied to the co-monotonic sequences a_i = r(\Phi(f_i))^{p+1} and b_i = r(\Phi(f_i)), the pre-projection time component satisfies \tilde{v}_0 \geq \bar{r}_w (weighted mean radius). Since reprojection \Pi_K preserves the ordering of time components, the result follows. \square
Notably, the outward bias guarantees that high-resolution observations dominate the aggregated representation. This is essential for planetary science retrieval: when a user queries a specific geological feature, the aggregated evidence should preserve the fine-scale observational details rather than being smoothed into a coarse-resolution summary.
D. Physics-Informed Conflict Triage
We define the multi-source spatial evidence retrieved in a single query as observation-grounded homologous data. Although targeting the same query object, these data often provide inconsistent factual statements due to differences in instrument principles, observation geometry, and acquisition epochs. Unlike MultiRAG's MCC module, which assumes that inconsistency indicates unreliability and employs mutual information entropy to filter conflicting nodes, we adopt a fundamentally different paradigm: physics-informed conflict triage (PICT), which classifies conflicts by their physical origin and applies differentiated processing strategies.
- Observation-Grounded Conflict Formalization: The first stage establishes a formal framework for reasoning about conflicts in the context of physical observations. Each knowledge source carries not only factual content but also a physical measurement model that constrains what it can observe.
Definition 3. Observation-grounded knowledge source. A planetary observation knowledge source is defined as \mathcal{K}_s = (\mathcal{I}_s, \Omega_s, F(\mathcal{K}_s), \mathcal{M}_s), where \mathcal{I}_s denotes the instrument, \Omega_s = (\ell_{res}, \lambda_{band}, \theta_{view}, d_{pen}) denotes observation geometry parameters (spatial resolution, spectral band, viewing angle, penetration depth), F(\mathcal{K}_s) denotes the set of atomic factual statements, and \mathcal{M}_s denotes the physical measurement model mapping target properties through observation constraints to observable facts.
For two sources \mathcal{K}_i and \mathcal{K}_j (i \neq j), the pairwise conflict set is:
\mathcal{C}_{i,j} = \{(\psi_i, \psi_j) \mid \psi_i \in F(\mathcal{K}_i), \psi_j \in F(\mathcal{K}_j), \psi_i \bot \psi_j\} \tag{11}
where \psi_i \bot \psi_j denotes semantic incompatibility. We further introduce the central distinction of PICT:
Definition 4. Explainable conflict and opaque conflict. A pairwise conflict (\psi_i, \psi_j) \in \mathcal{C}_{i,j} is explainable if there exists a physical bridging function \mathcal{B} such that:
\mathcal{B}(\Omega_i, \Omega_j, \mathcal{M}_i, \mathcal{M}_j) \models \neg(\psi_i \bot \psi_j) \tag{12}
i.e., the apparent inconsistency is resolvable by accounting for observation constraint differences. Otherwise, the conflict is opaque. Based on this distinction, we define four conflict categories:
| Category | Condition | Strategy |
|---|---|---|
Noise (\mathcal{C}^{noise}) |
Opaque, with significant source authority disparity | Filter low-authority source |
Instrument-Inherent (\mathcal{C}^{inst}) |
Explainable via \Omega_i \neq \Omega_j |
Preserve with physical explanation |
Scale-Dependent (\mathcal{C}^{scale}) |
Explainable via \ell_{res}^i \neq \ell_{res}^j |
Preserve with cross-scale linkage |
Temporal-Evolution (\mathcal{C}^{temp}) |
Explainable via \mathcal{T}_i \neq \mathcal{T}_j |
Preserve with temporal ordering |
- Cross-Source Interaction Entropy: In the second stage, we detect conflicts by measuring the information-theoretic interaction effect when two sources are jointly presented to the LLM. Unlike TruthfulRAG [9], which compares retrieval-augmented entropy against parametric-only entropy (
\Delta H_p = H(P_{aug}) - H(P_{param})), this formulation is inapplicable to our setting where all knowledge is external. We instead measure the cross-source interaction:
\mathcal{H}_{inter}(p_i, p_j \mid q) = H\left(P(\text{ans} \mid q, p_i \oplus p_j)\right) - \frac{1}{2}\left[H\left(P(\text{ans} \mid q, p_i)\right) + H\left(P(\text{ans} \mid q, p_j)\right)\right] \tag{13}
where H(\cdot) is the token-averaged entropy over top-k candidate tokens:
H\left(P(\text{ans} \mid \text{context})\right) = -\frac{1}{|l|}\sum_{t=1}^{|l|}\sum_{i=1}^{k} pr_i^{(t)} \log_2 pr_i^{(t)} \tag{14}
and p_i \oplus p_j denotes the concatenation of both reasoning paths. Positive values of \mathcal{H}_{inter} (super-additive uncertainty) indicate that the two sources contradict each other; near-zero values indicate independence or consistency; negative values (sub-additive) indicate mutual complementarity. Reasoning path pairs exhibiting interaction entropy exceeding a predefined threshold \epsilon are classified as detected conflicts:
\mathcal{C}^{detected} = \{(\psi_i, \psi_j) \mid \mathcal{H}_{inter}(p_i, p_j \mid q) > \epsilon\} \tag{15}
- Conflict Classification and Confidence Recalibration: In the third stage, each detected conflict is classified and the node confidence is recalibrated accordingly. For each detected conflict, we construct a feature vector that fuses information-theoretic, physical, and neural signals:
\mathbf{z}_{conf} = \left[\mathcal{H}_{inter}, \; \|\Omega_i - \Omega_j\|, \; |\log(\ell_{res}^i / \ell_{res}^j)|, \; \Delta\mathcal{T}, \; \rho_{auth}(i,j), \; \mathbf{h}^{(l^*)}_{conf}\right] \tag{16}
where \mathbf{h}^{(l^*)}_{conf} is the LLM hidden state at the conflict encoding layer (mid-to-late layers where conflict signals concentrate, following the depth localization finding of [8]). A lightweight classifier maps the feature vector to conflict type:
\hat{c} = \arg\max_{c \in \{noise, inst, scale, temp\}} P_\theta(c \mid \mathbf{z}_{conf}) \tag{17}
Proposition 2 (Conflict Type Separability). The four conflict types are distinguished by orthogonal physical dimensions: \|\Omega_i - \Omega_j\| separates instrument conflicts; |\log(\ell_{res}^i / \ell_{res}^j)| separates scale conflicts; \Delta\mathcal{T} separates temporal conflicts; \rho_{auth} separates noise conflicts. Since these physical features are independent of and complementary to the hidden state features \mathbf{h}^{(l^*)}_{conf} (which encode semantic inconsistency and achieve > 93% AUC with a linear probe [8]), the four conflict types are linearly separable in the augmented feature space \mathbf{z}_{conf}.
Based on the classification result, we recalibrate the node confidence. This is the key departure from MultiRAG's MCC, which uniformly penalizes inconsistency:
C_{triage}(v) = \begin{cases} C_{MCC}(v) & \text{if } v \notin \mathcal{C}^{detected} \\ \alpha \cdot C_{MCC}(v) + (1-\alpha) \cdot \eta & \text{if } \hat{c} = noise \\ C_{MCC}(v) + \beta \cdot \mathcal{H}_{inter}^{-1} & \text{if } \hat{c} \in \{inst, scale\} \\ C_{MCC}(v) \cdot \gamma(|\Delta\mathcal{T}|) & \text{if } \hat{c} = temp \end{cases} \tag{18}
where \eta < 0 is a penalty term for noise conflicts, \beta > 0 is a boost coefficient for scientifically explainable conflicts, and \gamma(|\Delta\mathcal{T}|) is a time-decay weighting function that prioritizes recent observations while preserving temporal evolution signals.
Theorem 2 (Anti-Over-Smoothing Guarantee). Let V_{sci} \subset V denote the set of nodes involved in explainable scientific conflicts (\mathcal{C}^{inst} \cup \mathcal{C}^{scale} \cup \mathcal{C}^{temp}). Under PICT with \beta > 0:
C_{triage}(v) > C_{MCC}(v) \quad \forall v \in V_{sci} \tag{19}
Proof. For v \in \mathcal{C}^{inst} \cup \mathcal{C}^{scale}: C_{triage}(v) = C_{MCC}(v) + \beta \cdot \mathcal{H}_{inter}^{-1}. Since \beta > 0 and \mathcal{H}_{inter} > \epsilon > 0 (by the detection threshold in Eq. 15), \beta \cdot \mathcal{H}_{inter}^{-1} > 0, thus C_{triage}(v) > C_{MCC}(v). For v \in \mathcal{C}^{temp}: \gamma(|\Delta\mathcal{T}|) > 1 for temporal contrasts with scientific significance, ensuring amplification. \square
This theorem guarantees that scientifically valuable conflict nodes can never be filtered out by the confidence mechanism, directly addressing the over-smoothing problem.
Ultimately, we design the conflict triage algorithm, PICT, to replace the MCC of MultiRAG. For noise conflicts, the low-authority source is filtered (compatible with the original MCC logic). For instrument-inherent and scale-dependent conflicts, both sources are preserved with a physical bridging explanation \mathcal{B}(\Omega_i, \Omega_j) appended to the context. For temporal-evolution conflicts, a temporal ordering is constructed. All preserved evidence carries provenance metadata (DataID, source institution, instrument identity, observation timestamp) to ensure scientific traceability.
E. XXX-RAG Prompting
We propose the XXX-RAG Prompting (ARP) algorithm for multi-source planetary spatial data retrieval. Given a user query q, the LLM is first employed to extract entities, spatial constraints, and temporal constraints, generating corresponding logical and spatial relationships. The observation data then undergoes multi-source spatial adapter parsing to derive normalized datasets, followed by constructing a Hyperbolic Spatial Hypergraph (HySH) for spatiotemporal knowledge aggregation. Further, spatiotemporal retrieval with Spatial OEM aggregation is performed to obtain multi-source spatial evidence. Finally, by leveraging the PICT mechanism, conflict detection and triage are executed, and the triage-calibrated confidence is computed to enhance the reliability of the answer while preserving scientific conflicts. The results are embedded into the context of the LLM, together with provenance and conflict explanations, to generate a scientifically faithful retrieval answer.
Algorithm 1. XXX-RAG Prompting (ARP)
procedure ARP$(q)$
\quad \mathcal{E}_q, \mathcal{R}_q, \mathcal{P}_{foot}, \mathcal{T}_{win} \leftarrow Spatial Intent Extraction$(q)$
\quad D_q \leftarrow Multi-source Spatial Adapter Parsing$(D)$
\quad \mathcal{G}_{hyp} \leftarrow HySH Construction$(D_q)$ \quad\triangleright Eq. 3-6
\quad \mathcal{T}_q \leftarrow Spatiotemporal Retrieval$(\mathcal{G}_{hyp}, \mathcal{E}_q)$ \quad\triangleright Eq. 7-9
\quad \mathbf{m}_{agg} \leftarrow Spatial OEM Aggregation$(\mathcal{T}_q)$ \quad\triangleright Eq. 10
\quad \mathcal{C}^{detected} \leftarrow Cross-Source Interaction Entropy$(\mathcal{T}_q, q)$ \quad\triangleright Eq. 13-15
\quad for (\psi_i, \psi_j) \in \mathcal{C}^{detected} do
\quad\quad \hat{c} \leftarrow Conflict Classification$(\mathbf{z}_{conf})$ \quad\triangleright Eq. 16-17
\quad\quad C_{triage}(v) \leftarrow Confidence Recalibration$(v, \hat{c})$ \quad\triangleright Eq. 18
\quad end for
\quad Context \leftarrow Differential Context Construction$(q, \mathcal{T}_q, \hat{c})$
\quad Answer \leftarrow LLM$(q \oplus$ Context \oplus Provenance$)$
\quad return Answer
end procedure
It should be noted that the ARP algorithm integrates the HySH and PICT modules in a sequential pipeline: HySH provides spatially aligned multi-source evidence, which PICT then evaluates for conflict semantics. The two modules interact through three coupling points: (1) spatial alignment (Eq. 6) is a prerequisite for meaningful interaction entropy computation (Eq. 13); (2) the radial depth difference \Delta r from HySH (Eq. 8) directly feeds into the PICT feature vector (Eq. 16) as the resolution disparity signal; (3) triage results feed back to boost retrieval priority of scientifically interesting regions in subsequent queries.