Files
Mars-RAG-paper/paper_preliminary_methodology.md
2026-04-02 10:24:22 +08:00

31 KiB

II. PRELIMINARY

In the field of planetary spatial knowledge retrieval, the primary challenges include faithfully representing continuous spatiotemporal relationships across heterogeneous observation sources and achieving reliable retrieval under inherent inter-source scientific conflicts. This section introduces the core elements of our approach and precisely defines the problems we address.

Let Q = \{q_1, q_2, \ldots, q_n\} be the set of query instances, where each q_i corresponds to a distinct planetary science query. Let \mathcal{E} = \{e_1, e_2, \ldots, e_m\} be the set of entities in the spatial knowledge hypergraph, where each e_j represents a geological feature, instrument, or observation product. Let \mathcal{R} = \{r_1, r_2, \ldots, r_p\} be the set of relationships, and let \mathcal{F} = \{f_1^n, f_2^n, \ldots, f_s^n\} be the set of $n$-ary relational facts (hyperedges). Let D = \{d_1, d_2, \ldots, d_t\} be the set of observation data products, where each d_l represents an observation record from a specific instrument. We define the spatially-grounded knowledge-guided retrieval augmented generation problem as follows:

\arg \max_{d_i \in D} \text{LLM}(q_i, d_i), \quad \sum_{e_j \in \mathcal{E}} \sum_{f_k^n \in \mathcal{F}} \text{HG}(e_j, f_k^n, d_i) \cdot \mathcal{S}_{geo}(q_i, d_i) \tag{1}

where \text{LLM}(q_i, d_l) denotes the relevance score between query q_i and document d_l assessed by the LLM, \text{HG}(e_j, f_k^n, d_l) represents the degree of match between entity e_j, $n$-ary fact f_k^n, and document d_l in the hypergraph, and \mathcal{S}_{geo}(q_i, d_i) is a spatial compatibility function that ensures the retrieved evidence satisfies the geospatial constraints (footprint overlap, temporal window, resolution range) specified in the query.

Furthermore, we optimize the knowledge construction and retrieval modules by introducing a hyperbolic spatial hypergraph to achieve spatially faithful knowledge aggregation and physics-informed conflict handling. Specifically, the proposed approach is formally defined through the following definitions.

Definition 1. Multi-source planetary observation data. Given a set of observation platforms \mathcal{H} (e.g., MRO, Mars Express, Tianwen-1, Curiosity, Zhurong), the observation data D = \{\mathcal{I}, \mathcal{P}_{foot}, \mathcal{T}_{win}, \mathcal{S}_{band}, c, \text{meta}\} exists, where \mathcal{I} denotes the instrument identity, \mathcal{P}_{foot} \subset \mathbb{S}^2_{Mars} denotes the spatial footprint on the Martian surface, \mathcal{T}_{win} denotes the temporal acquisition window parameterized by Solar Longitude L_s, \mathcal{S}_{band} denotes the spectral band configuration, c represents the observation content (image, spectrum, or derived product), and meta represents the PDS/CNSA metadata. Through a multi-source spatial adapter parsing algorithm, we obtain normalized data \widehat{D} = \{\text{id}, \mathcal{I}, \mathcal{P}_{foot}, \mathcal{T}_{win}, \mathcal{S}_{band}, \ell_{res}, \text{jsc}, \text{meta}\}, where id is the unique identifier, \ell_{res} \in \mathbb{R}^+ denotes the ground sampling distance (spatial resolution), and jsc denotes the observation content stored using JSON-LD for linked data interoperability.

Definition 2. $N$-ary spatial knowledge hypergraph. An $n$-ary spatial knowledge hypergraph is defined as \mathcal{G}_{hyp} = (\mathcal{E}, \mathcal{R}, \mathcal{F}_{spa}), where \mathcal{E} denotes the entity set, \mathcal{R} denotes the relation set, and \mathcal{F}_{spa} denotes the set of spatial observation hyperedges. Each spatial observation hyperedge f_{spa}^n \in \mathcal{F}_{spa} binds multiple entities and observation parameters into a single $n$-ary relational fact:

f_{spa}^n = (\mathcal{I}, \; \mathcal{P}_{foot}, \; \mathcal{T}_{win}, \; \mathcal{S}_{band}, \; \mathcal{O}_{target}, \; \ell_{res}) \tag{2}

where \mathcal{O}_{target} denotes the set of target geological features. Unlike binary knowledge graphs where k co-located entities require \binom{k}{2} = O(k^2) pairwise edges, a single $n$-ary hyperedge binds all k entities with O(k) complexity, directly resolving the edge explosion problem.

Definition 3. Hyperbolic space embedding. We represent \mathcal{G}_{hyp} in $d$-dimensional hyperbolic space \mathbb{H}_K^d with constant negative curvature K < 0 using the Lorentz (hyperboloid) model. The hyperbolic space is realized as:

\mathbb{H}_K^d = \left\{ \mathbf{x} \in \mathbb{R}^{d+1} \mid \langle \mathbf{x}, \mathbf{x} \rangle_L = \frac{1}{K}, \; x_0 > 0 \right\} \tag{3}

where \langle \mathbf{x}, \mathbf{y} \rangle_L = -x_0 y_0 + \sum_{i=1}^{d} x_i y_i is the Lorentzian inner product. The geodesic distance between two points \mathbf{x}, \mathbf{y} \in \mathbb{H}_K^d is d_K(\mathbf{x}, \mathbf{y}) = \frac{1}{\sqrt{-K}} \cosh^{-1}(K \langle \mathbf{x}, \mathbf{y} \rangle_L). The radial depth r(\mathbf{x}) = x_0 encodes the intrinsic distance from the origin and serves as a proxy for hierarchical specificity: entities near the origin represent coarse, global-scale features, while those at large radial depth represent fine-scale, local observations.

Definition 4. Observation-grounded homologous data. For a query Q(q, \mathcal{G}_{hyp}) on the spatial hypergraph \mathcal{G}_{hyp}, the multi-source spatial evidence retrieved in a single query is defined as observation-grounded homologous data. For any two observations v_1 and v_2 in \mathcal{G}_{hyp}, they are observation-grounded homologous if and only if they: (a) belong to the same retrieval candidate set, and (b) their spatial footprints satisfy \mathcal{P}_{foot}(v_1) \cap \mathcal{P}_{foot}(v_2) \neq \varnothing.

Definition 5. Observation-grounded knowledge source. A planetary observation knowledge source is defined as \mathcal{K}_s = (\mathcal{I}_s, \Omega_s, F(\mathcal{K}_s), \mathcal{M}_s), where \mathcal{I}_s denotes the instrument, \Omega_s = (\ell_{res}, \lambda_{band}, \theta_{view}, d_{pen}) denotes the observation geometry parameters (spatial resolution, spectral band, viewing angle, penetration depth), F(\mathcal{K}_s) denotes the set of atomic factual statements, and \mathcal{M}_s denotes the physical measurement model that maps target properties through observation constraints to observable facts.

Definition 6. Conflict triage confidence. For observation-grounded homologous data obtained from the spatial hypergraph, the conflict triage confidence integrates two levels of assessment: (a) cross-source interaction entropy to detect inter-source conflicts, and (b) physics-informed conflict classification to determine whether detected conflicts represent noise to be filtered or scientifically meaningful observational divergences to be preserved. Unlike conventional candidate confidence [14] that uniformly penalizes inconsistency, conflict triage confidence applies differentiated recalibration based on the physical origin of each conflict.

III. METHODOLOGY

A. Framework of AreoRAG

This section elaborates on the implementation approach of AreoRAG. As shown in Fig. 3, the framework comprises three tightly coupled modules. The first step involves constructing a Hyperbolic Spatial Hypergraph (HySH) from multi-source planetary observation data, achieving unified spatiotemporal representation via $n$-ary observation hyperedges embedded in hyperbolic space (Section III-B); the second step performs spatiotemporal retrieval on the constructed HySH, where hyperbolic spatial proximity encoding and cross-resolution aggregation via the Spatial Outward Einstein Midpoint are employed to extract query-relevant multi-source evidence (Section III-C); the third step applies Physics-Informed Conflict Triage (PICT), which detects inter-source conflicts via cross-source interaction entropy, classifies them into four scientific categories, and executes conflict-aware confidence recalibration to preserve scientifically valuable disagreements while filtering noise (Section III-D). Finally, integrating the aforementioned steps to form the AreoRAG Prompting algorithm, ARP (Section III-E).

The three modules interact through three explicit coupling points: (1) HySH's spatial alignment is a prerequisite for meaningful interaction entropy computation in PICT; (2) the radial depth difference \Delta r from HySH directly feeds into the PICT feature vector as the resolution disparity signal; and (3) PICT's triage results feed back to boost retrieval priority of scientifically interesting regions in subsequent queries.

B. Hyperbolic Spatial Hypergraph Construction

The AreoRAG method begins by constructing a knowledge structure that can faithfully represent the continuous spatiotemporal topology of planetary multi-source data. Unlike MultiRAG's Multi-source Line Graph (MLG), which relies on discrete text entities and binary triples, we introduce a hypergraph structure embedded in hyperbolic space to jointly address edge explosion and spatial scale hierarchy.

1) Multi-source Spatial Adapter Parsing: We first design a spatial adapter for each observation data source to parse instrument metadata, spatial footprints, temporal windows, and spectral parameters. For orbital remote sensing data (e.g., HiRISE, CTX, CRISM), parsing involves extracting the image footprint geometry, ground sampling distance, and spectral band configuration from PDS labels. For in-situ data (e.g., rover spectrometers, ground-penetrating radar), parsing extracts the rover traverse coordinates, measurement timestamps in Sol, and instrument-specific parameters such as penetration depth. All temporal references are unified to Solar Longitude L_s to enable cross-platform temporal comparison. For derived data products (e.g., DTMs, mineral abundance maps), parsing extracts provenance links to the source observations and processing parameters.

The final integration of multi-source spatial data can be expressed as:

D_{Fusion} = \bigcup_{i=1}^{n} A_i^{spa}(D_i) \tag{4}

where A_i^{spa} \in \{Ada_{orbital}, Ada_{insitu}, Ada_{derived}\} represents the spatial adapter parsing functions for orbital, in-situ, and derived data products respectively, and D_i represents the original observation datasets from different platforms.

Through the parsed data D_{Fusion}, we further extract entities (geological features, mineral signatures, topographic structures), relationships (spatial containment, temporal succession, compositional association), and observation-specific attributes. The knowledge extraction process employs LLM-based entity recognition guided by a planetary science domain schema:

KB = \sum_{D_i} \left( \{e_1, e_2, \ldots, e_m\} \sqcup \{r_1, r_2, \ldots, r_n\} \sqcup \{f_{spa,1}^n, \ldots, f_{spa,p}^n\} \right) \tag{5}

2) Spatial Observation Hyperedge Formation: Based on the extracted knowledge base, we construct spatial observation hyperedges that bind co-located multi-source observations into single $n$-ary facts. As formalized in Definition 2, each hyperedge f_{spa}^n encapsulates the instrument, spatial footprint, temporal window, spectral bands, target features, and resolution. In a pairwise binary graph, k co-existing spatial entities require \binom{k}{2} = O(k^2) spatial proximity edges. With hyperedges, a single $n$-ary fact binds all k entities, reducing edge complexity to O(k). This directly resolves the edge explosion problem identified in our analysis of MLG.

3) Scale-Aware Lorentz Embedding: We embed the spatial observation hypergraph in $d$-dimensional hyperbolic space \mathbb{H}_K^d using the Lorentz model (Definition 3). The key innovation is coupling the radial depth with spatial resolution through an embedding mapping \Phi: \mathcal{F}_{spa} \rightarrow \mathbb{H}_K^d:

r\left(\Phi(f_{spa}^n)\right) = \frac{1}{\sqrt{-K}} \cosh\left(\sqrt{-K} \cdot g(\ell_{res})\right) \tag{6}

where g(\ell_{res}) = -\log(\ell_{res} / \ell_{max}) is a monotone decreasing function of resolution, and r(\mathbf{x}) = x_0 denotes the radial depth.

This embedding design is motivated by the following observation on the intrinsic geometry of planetary spatial data:

Proposition 1 (Spatial Scale-Curvature Correspondence). The planetary spatial observation hierarchy exhibits tree-like branching: each coarser-resolution observation spatially contains multiple finer-resolution observations. Let N(\ell) denote the number of observations at resolution level \ell. For remote sensing data with total survey area A_{coverage}:

N(\ell) \propto A_{coverage} / \ell^2 \tag{7}

As resolution \ell decreases (finer scale), N(\ell) grows quadratically, exhibiting the exponential branching characteristic of negative-curvature spaces. Therefore, the spatial scale hierarchy is intrinsically hyperbolic, and Euclidean embedding with polynomial volume growth cannot faithfully represent it.

Through this embedding, global coarse-resolution data (e.g., MOLA topography at ~460 m) is placed near the hyperbolic origin (small radial depth), while local high-resolution data (e.g., HiRISE at 0.3 m) is placed far from the origin (large radial depth). The exponential volume growth of \mathbb{H}_K^d naturally accommodates the exponentially increasing number of observations at finer scales.

4) Cross-Reference-Frame Alignment: To address the heterogeneous reference frame problem (orbiter areocentric coordinates vs. rover-centric local coordinates), we align all observations to a global reference via parallel transport on the hyperbolic manifold:

\Phi_{aligned}(e) = \exp_{o_g}\left(\Gamma_{o_k \to o_g}\left(\log_{o_k}(\Phi_k(e))\right)\right) \tag{8}

where \log_{o_k} is the logarithmic map at the local reference origin o_k, \Gamma_{o_k \to o_g} is the parallel transport operator along the geodesic from o_k to the global origin o_g, and \exp_{o_g} is the exponential map at the global origin. Unlike Euclidean affine transformations, hyperbolic parallel transport preserves geodesic distances and radial depth, ensuring that scale hierarchy information is maintained after cross-frame alignment.

Here, we provide a simple example of hyperbolic spatial hypergraph construction. As shown in Fig. 4, an observation region is covered by three sources at different resolutions: a CTX mosaic (6 m), an HiRISE strip (0.3 m), and a CRISM spectral cube (18 m). In the HySH, the HiRISE observation (finest resolution) is embedded at the largest radial depth, while the CRISM observation (coarsest resolution) is nearest to the origin. A spatial observation hyperedge binds all three observations and their co-located geological features into a single $n$-ary fact, without requiring O(k^2) pairwise edges.

C. Spatiotemporal Retrieval with Cross-Resolution Aggregation

After the construction of the hyperbolic spatial hypergraph, the next step is to retrieve query-relevant multi-source spatial evidence. The retrieval process comprises two phases: spatiotemporal evidence extraction and cross-resolution aggregation.

1) Spatial Intent Extraction and Hyperedge Retrieval: Given a user query q, we first employ the LLM to extract spatial intent, including target entities, spatial constraints (footprint, region), temporal constraints (L_s range, Sol range), and resolution preferences. These are denoted as query elements \mathcal{K}_q.

For each topic entity e_s \in \mathcal{E}_q extracted from the query, we retrieve its incident spatial observation hyperedges \mathcal{F}_{e_s} = \{f_{spa}^n \in \mathcal{F}_{spa} : e_s \in f_{spa}^n\} and derive pseudo-binary triples (e_h, f_{spa}^n, e_t) for pairwise reasoning, following the approach of HyperRAG [18]:

\mathcal{T}_q = \left\{ (e_h, f_{spa}^n, e_t) \mid f_{spa}^n \in \mathcal{F}_{e_s}, \; e_h \in f_{spa}^n, \; e_t \in f_{spa}^n \right\} \tag{9}

2) Hyperbolic Spatial Encoding and Plausibility Scoring: For each candidate triple, we compute a spatiotemporal encoding that fuses semantic, structural, and physical-spatial signals:

\mathbf{x} = \left[\varphi(q) \| \varphi(e_h) \| \varphi(f_{spa}^n) \| \varphi(e_t) \| \delta(e_h, f_{spa}^n, e_t) \| \psi_{geo}(e_h, e_t)\right] \tag{10}

where \varphi denotes a text embedding model, \delta denotes a structural proximity encoding adapted from SubGraphRAG [19] to operate on hyperedges, and \psi_{geo} is the hyperbolic spatial encoding defined as:

\psi_{geo}(e_h, e_t) = \left[d_K\left(\Phi(e_h), \Phi(e_t)\right), \; \Delta r(e_h, e_t), \; \cos\theta_{bearing}\right] \tag{11}

Here d_K is the geodesic distance in \mathbb{H}_K^d capturing physical proximity, \Delta r = |r(\Phi(e_h)) - r(\Phi(e_t))| encodes the scale difference via radial depth gap, and \cos\theta_{bearing} encodes the directional relationship. A lightweight MLP classifier f_\theta then scores the plausibility of each candidate triple:

\text{score}(e_h, f_{spa}^n, e_t) = f_\theta(\mathbf{x}) \in [0, 1] \tag{12}

Top-scored triples are retained and their tail entities form the frontier for next-hop expansion, following an adaptive search strategy with density-aware thresholding as in [18]. Specifically, we initialize with threshold \tau_0 = 0.5 and iteratively reduce by a decay factor c = 0.1 if the number of retrieved triples falls below a minimum acceptable count M, ensuring sufficient evidence coverage in sparse regions while preventing over-retrieval in dense regions.

3) Spatial Outward Einstein Midpoint Aggregation: After retrieval, the selected multi-source evidence typically spans multiple resolutions. To aggregate these into a unified representation without losing fine-scale information, we introduce the Spatial Outward Einstein Midpoint (Spatial OEM). The motivation stems from a known failure mode: naively averaging hyperbolic embeddings collapses representations toward the origin, destroying the hierarchical structure encoded in radial depth [20].

Given spatial observation hyperedge embeddings \{\Phi(f_i)\}_{i=1}^n \subset \mathbb{H}_K^d with query-relevance weights w_i and resolution-aware radial weighting \phi_{res}(f_i) = r(\Phi(f_i))^p:

\mathbf{m}_{K,p}^{Spa\text{-}OEM} = \Pi_K\left(\frac{\sum_{i=1}^{n} w_i \cdot \phi_{res}(f_i) \cdot \lambda_i \cdot \Phi(f_i)}{\sum_{i=1}^{n} w_i \cdot \phi_{res}(f_i) \cdot \lambda_i}\right) \tag{13}

where \lambda_i = \Phi(f_i)_0 is the Lorentz factor and \Pi_K denotes reprojection onto \mathbb{H}_K^d, defined as \Pi_K(\mathbf{v}) = \frac{\mathbf{v}}{\sqrt{K \langle \mathbf{v}, \mathbf{v} \rangle_L}} for \mathbf{v} with \langle \mathbf{v}, \mathbf{v} \rangle_L < 0 and v_0 > 0.

Theorem 1 (Spatial OEM Outward Bias). For p \geq 1, the Spatial OEM satisfies:

r(\mathbf{m}_{K,p}^{Spa\text{-}OEM}) \geq r(\mathbf{m}_K^{Ein})

where \mathbf{m}_K^{Ein} is the standard Einstein midpoint (p = 0).

Proof. The OEM weights \tilde{w}_i \propto w_i \cdot r(\Phi(f_i))^{p+1} concentrate more mass on high-radius points than the Einstein weights w_i \cdot r(\Phi(f_i)). By the Chebyshev sum inequality applied to the co-monotonic sequences a_i = r(\Phi(f_i))^{p+1} and b_i = r(\Phi(f_i)), the pre-projection time component satisfies \tilde{v}_0 \geq \bar{r}_w (weighted mean radius). Since reprojection \Pi_K preserves the ordering of time components, the result follows. \square

The outward bias guarantees that high-resolution observations dominate the aggregated representation. This is essential for planetary science retrieval: when a user queries a specific geological feature, the aggregated evidence should preserve the fine-scale observational details rather than being smoothed into a coarse-resolution summary.

D. Physics-Informed Conflict Triage

We define the multi-source spatial evidence retrieved in a single query as observation-grounded homologous data (Definition 4). Although targeting the same query object, these data often provide inconsistent factual statements due to differences in instrument principles, observation geometry, and acquisition epochs. Unlike MultiRAG's Multi-level Confidence Computing (MCC), which assumes that inconsistency indicates unreliability and employs mutual information entropy to filter conflicting nodes, we adopt a fundamentally different paradigm: Physics-Informed Conflict Triage (PICT), which classifies conflicts by their physical origin and applies differentiated processing strategies.

1) Cross-Source Interaction Entropy: The first stage detects conflicts by measuring the information-theoretic interaction effect when two sources are jointly presented to the LLM. Existing entropy-based conflict detection methods, such as TruthfulRAG [17], compare retrieval-augmented entropy against parametric-only entropy (\Delta H_p = H(P_{aug}) - H(P_{param})). However, this formulation is inapplicable to our setting where all knowledge is external observational data rather than LLM parametric knowledge. We instead propose cross-source interaction entropy that measures the mutual interference between two observation sources:

\mathcal{H}_{inter}(p_i, p_j \mid q) = H\left(P(\text{ans} \mid q, p_i \oplus p_j)\right) - \frac{1}{2}\left[H\left(P(\text{ans} \mid q, p_i)\right) + H\left(P(\text{ans} \mid q, p_j)\right)\right] \tag{14}

where H(\cdot) is the token-averaged entropy over top-k candidate tokens:

H\left(P(\text{ans} \mid \text{context})\right) = -\frac{1}{|l|}\sum_{t=1}^{|l|}\sum_{i=1}^{k} pr_i^{(t)} \log_2 pr_i^{(t)} \tag{15}

and p_i \oplus p_j denotes the concatenation of both reasoning paths derived from sources \mathcal{K}_i and \mathcal{K}_j respectively. The interaction entropy admits a clear physical interpretation: positive values (\mathcal{H}_{inter} > 0, super-additive uncertainty) indicate that the two sources contradict each other, jointly creating more confusion than either alone; near-zero values indicate independence or consistency; negative values (sub-additive) indicate mutual complementarity where the sources reinforce each other.

Reasoning path pairs exhibiting interaction entropy exceeding a predefined threshold \epsilon are classified as detected conflicts:

\mathcal{C}^{detected} = \{(\psi_i, \psi_j) \mid \mathcal{H}_{inter}(p_i, p_j \mid q) > \epsilon\} \tag{16}

2) Physics-Informed Conflict Classification: The second stage classifies each detected conflict by its physical origin. We introduce the central distinction of PICT:

Definition 7. Explainable conflict and opaque conflict. A pairwise conflict (\psi_i, \psi_j) \in \mathcal{C}_{i,j} is explainable if there exists a physical bridging function \mathcal{B} such that:

\mathcal{B}(\Omega_i, \Omega_j, \mathcal{M}_i, \mathcal{M}_j) \models \neg(\psi_i \bot \psi_j) \tag{17}

i.e., the apparent inconsistency is resolvable by accounting for observation constraint differences (\Omega_i, \Omega_j) and measurement model differences (\mathcal{M}_i, \mathcal{M}_j). Otherwise, the conflict is opaque.

Based on this distinction, we define four conflict categories, each with a differentiated processing strategy:

Category Condition Strategy
Noise (\mathcal{C}^{noise}) Opaque, with significant source authority disparity Filter low-authority source
Instrument-Inherent (\mathcal{C}^{inst}) Explainable via \Omega_i \neq \Omega_j Preserve with physical explanation
Scale-Dependent (\mathcal{C}^{scale}) Explainable via \ell_{res}^i \neq \ell_{res}^j Preserve with cross-scale linkage
Temporal-Evolution (\mathcal{C}^{temp}) Explainable via \mathcal{T}_i \neq \mathcal{T}_j Preserve with temporal ordering

For each detected conflict, we construct a feature vector that fuses information-theoretic, physical, and neural signals:

\mathbf{z}_{conf} = \left[\mathcal{H}_{inter}, \; \|\Omega_i - \Omega_j\|, \; |\log(\ell_{res}^i / \ell_{res}^j)|, \; \Delta\mathcal{T}, \; \rho_{auth}(i,j), \; \mathbf{h}^{(l^*)}_{conf}\right] \tag{18}

where \|\Omega_i - \Omega_j\| is the observation geometry disparity, |\log(\ell_{res}^i / \ell_{res}^j)| is the resolution ratio in log-scale, \Delta\mathcal{T} is the temporal separation, \rho_{auth}(i,j) is the authority disparity between sources, and \mathbf{h}^{(l^*)}_{conf} is the LLM hidden state at the conflict encoding layer. The inclusion of \mathbf{h}^{(l^*)}_{conf} is motivated by the finding that knowledge conflict signals concentrate in mid-to-late layers of LLMs and are linearly separable with > 93% AUC [21].

A lightweight classifier maps the feature vector to conflict type:

\hat{c} = \arg\max_{c \in \{noise, inst, scale, temp\}} P_\theta(c \mid \mathbf{z}_{conf}) \tag{19}

Proposition 2 (Conflict Type Separability). The four conflict types are distinguished by orthogonal physical dimensions: \|\Omega_i - \Omega_j\| separates instrument conflicts; |\log(\ell_{res}^i / \ell_{res}^j)| separates scale conflicts; \Delta\mathcal{T} separates temporal conflicts; \rho_{auth} separates noise conflicts. Since these physical features are independent of and complementary to the hidden state features \mathbf{h}^{(l^*)}_{conf} (which encode semantic inconsistency), the four conflict types are linearly separable in the augmented feature space \mathbf{z}_{conf}.

3) Conflict-Aware Confidence Recalibration: Based on the classification result, we recalibrate the node confidence. This is the key departure from MultiRAG's MCC, which uniformly penalizes inconsistency:

C_{triage}(v) = \begin{cases} C_{base}(v) & \text{if } v \notin \mathcal{C}^{detected} \\ \alpha \cdot C_{base}(v) + (1-\alpha) \cdot \eta & \text{if } \hat{c} = noise \\ C_{base}(v) + \beta \cdot \mathcal{H}_{inter}^{-1} & \text{if } \hat{c} \in \{inst, scale\} \\ C_{base}(v) \cdot \gamma(|\Delta\mathcal{T}|) & \text{if } \hat{c} = temp \end{cases} \tag{20}

where C_{base}(v) is the baseline confidence computed via semantic similarity (analogous to the node consistency score in [14]), \eta < 0 is a penalty term for noise conflicts, \beta > 0 is a boost coefficient for scientifically explainable conflicts, and \gamma(|\Delta\mathcal{T}|) is a time-decay weighting function that prioritizes recent observations while preserving temporal evolution signals. Specifically, \gamma(|\Delta\mathcal{T}|) = 1 + \beta_{temp} \cdot \exp(-|\Delta\mathcal{T}| / \tau_{decay}), where \beta_{temp} > 0 ensures \gamma > 1 for temporal contrasts with scientific significance.

Theorem 2 (Anti-Over-Smoothing Guarantee). Let V_{sci} \subset V denote the set of nodes involved in explainable scientific conflicts (\mathcal{C}^{inst} \cup \mathcal{C}^{scale} \cup \mathcal{C}^{temp}). Under PICT with \beta > 0:

C_{triage}(v) > C_{base}(v) \quad \forall v \in V_{sci} \tag{21}

Proof. For v \in \mathcal{C}^{inst} \cup \mathcal{C}^{scale}: C_{triage}(v) = C_{base}(v) + \beta \cdot \mathcal{H}_{inter}^{-1}. Since \beta > 0 and \mathcal{H}_{inter} > \epsilon > 0 (by the detection threshold in Eq. 16), \beta \cdot \mathcal{H}_{inter}^{-1} > 0, thus C_{triage}(v) > C_{base}(v). For v \in \mathcal{C}^{temp}: \gamma(|\Delta\mathcal{T}|) > 1 by construction (since \beta_{temp} > 0 and \exp(\cdot) > 0), thus C_{triage}(v) = C_{base}(v) \cdot \gamma(|\Delta\mathcal{T}|) > C_{base}(v). \square

This theorem provides a formal guarantee that scientifically valuable conflict nodes can never be suppressed below their baseline confidence by the triage mechanism, directly addressing the over-smoothing problem identified in Section I.

E. AreoRAG Prompting

We propose the AreoRAG Prompting (ARP) algorithm for multi-source planetary spatial data retrieval. The complete procedure is presented in Algorithm 1.


Algorithm 1. AreoRAG Prompting (ARP)


procedure ARP$(q)$

\quad \mathcal{E}_q, \mathcal{R}_q, \mathcal{P}_{foot}, \mathcal{T}_{win} \leftarrow Spatial Intent Extraction$(q)$

\quad D_q \leftarrow Multi-source Spatial Adapter Parsing$(D)$ \quad\triangleright Eq. 4-5

\quad \mathcal{G}_{hyp} \leftarrow HySH Construction$(D_q)$ \quad\triangleright Eq. 6-8

\quad \mathcal{T}_q \leftarrow Spatiotemporal Retrieval$(\mathcal{G}_{hyp}, \mathcal{E}_q)$ \quad\triangleright Eq. 9-12

\quad \mathbf{m}_{agg} \leftarrow Spatial OEM Aggregation$(\mathcal{T}_q)$ \quad\triangleright Eq. 13

\quad \mathcal{C}^{detected} \leftarrow Cross-Source Interaction Entropy$(\mathcal{T}_q, q)$ \quad\triangleright Eq. 14-16

\quad for (\psi_i, \psi_j) \in \mathcal{C}^{detected} do

\quad\quad \hat{c} \leftarrow Conflict Classification$(\mathbf{z}_{conf})$ \quad\triangleright Eq. 18-19

\quad\quad C_{triage}(v) \leftarrow Confidence Recalibration$(v, \hat{c})$ \quad\triangleright Eq. 20

\quad end for

\quad Context \leftarrow Differential Context Construction$(q, \mathcal{T}_q, \hat{c})$

\quad Answer \leftarrow LLM$(q \oplus$ Context \oplus Provenance$)$

\quad return Answer

end procedure


Given a user query q, the LLM is first employed to extract entities, spatial constraints (\mathcal{P}_{foot}, region), and temporal constraints (\mathcal{T}_{win}, L_s range), generating corresponding logical and spatial relationships. The observation data then undergoes multi-source spatial adapter parsing to derive normalized datasets (Eq. 4), followed by constructing a Hyperbolic Spatial Hypergraph via scale-aware Lorentz embedding and cross-reference-frame alignment (Eq. 6-8).

Subsequently, spatiotemporal retrieval is performed using hyperbolic spatial encoding and MLP-based plausibility scoring (Eq. 10-12), with Spatial OEM aggregation (Eq. 13) to produce a unified cross-resolution representation. The cross-source interaction entropy mechanism (Eq. 14-16) then detects inter-source conflicts, after which each detected conflict is classified via the physics-informed feature vector (Eq. 18-19) and the node confidence is recalibrated accordingly (Eq. 20).

The final step constructs a differential context based on the triage result. For noise conflicts, the low-authority source is filtered, compatible with conventional conflict elimination. For instrument-inherent and scale-dependent conflicts, both sources are preserved with a physical bridging explanation \mathcal{B}(\Omega_i, \Omega_j) appended to the context, enabling the LLM to reason about the physical origin of the disagreement. For temporal-evolution conflicts, a temporal ordering is constructed, allowing the LLM to trace the evolution of observations over time. All preserved evidence carries provenance metadata (DataID, source institution, instrument identity, observation timestamp in L_s) to ensure scientific traceability, analogous to the citation anchors in Perplexity-style retrieval systems.

It should be noted that the ARP algorithm constructs the HySH offline as a preprocessing step, while the PICT module operates online during each query. The HySH construction time is dominated by the LLM-based entity extraction (comparable to MultiRAG's MLG construction), while the online PICT overhead consists primarily of |\mathcal{C}^{detected}| forward passes through the lightweight conflict classifier (Eq. 19), which is negligible compared to the LLM generation cost.