Files
Mars-RAG-paper/paper_introduction.md
2026-04-02 10:24:22 +08:00

13 KiB

AreoRAG: Hyperbolic Spatial Hypergraph and Physics-Informed Conflict Triage for Multi-Source Planetary Retrieval Augmented Generation

Author Name {}^{1} , Author Name {}^{2\text{ \ding{42} }} , Author Name {}^{1}

{}^{1} Affiliation One

{}^{2} Affiliation Two

Email: {author1, author2}@example.edu

Abstract — Retrieval Augmented Generation (RAG) has demonstrated considerable promise in grounding Large Language Models (LLMs) with external knowledge for knowledge-intensive question answering. However, extending RAG to the domain of planetary science — where multi-source remote sensing observations are inherently embedded in continuous physical space and inter-source disagreements often carry scientific value — introduces fundamental challenges that existing multi-source RAG frameworks cannot address. These challenges manifest in two critical aspects: (1) existing discrete graph topologies (e.g., multi-source line graphs) suffer from edge explosion when encoding continuous spatial proximity, failing to bridge the gap between physical continuity and semantic discreteness; and (2) conventional conflict-filtering mechanisms, designed under the assumption that inter-source inconsistency implies unreliability, systematically suppress scientifically valuable observational disagreements that are intrinsic to multi-platform deep-space exploration. To address these challenges, we propose AreoRAG, a novel framework tailored for multi-source planetary spatial data retrieval augmented generation. Our framework introduces two key innovations: (1) a Hyperbolic Spatial Hypergraph (HySH) construction module that employs $n$-ary spatial observation hyperedges embedded in hyperbolic space via the Lorentz model, where spatial resolution is coupled with radial depth to faithfully represent the hierarchical scale structure of planetary observations while reducing edge complexity from O(k^2) to O(k); and (2) a Physics-Informed Conflict Triage (PICT) module that detects inter-source conflicts via cross-source interaction entropy, classifies them into four physically grounded categories (noise, instrument-inherent, scale-dependent, and temporal-evolution), and applies differentiated confidence recalibration to preserve scientifically valuable disagreements while filtering genuine noise. Extensive experiments on multi-source planetary observation datasets demonstrate that AreoRAG significantly enhances both the retrieval fidelity and the scientific faithfulness of knowledge-augmented generation in planetary science scenarios.

Index Terms — Retrieval Augmented Generation, Planetary Remote Sensing, Hyperbolic Hypergraph, Knowledge Conflict Triage, Multi-source Spatial Data, Mars Exploration

I. INTRODUCTION

The past two decades have witnessed an unprecedented accumulation of multi-source remote sensing data from Mars exploration missions. Orbital platforms such as Mars Reconnaissance Orbiter (MRO), Mars Express, and Tianwen-1 continuously acquire observations spanning diverse modalities — from sub-meter optical imagery (HiRISE at 0.3 m/pixel) and medium-resolution contextual mosaics (CTX at 6 m/pixel) to hyperspectral mineralogical mapping (CRISM at 18 m/pixel) and global topographic models (MOLA at ~460 m/pixel). Simultaneously, surface assets including the Curiosity and Zhurong rovers generate complementary in-situ measurements through spectrometers, ground-penetrating radar, and navigation cameras. This rapidly expanding, multi-source, multi-resolution data ecosystem has created a pressing demand for intelligent knowledge retrieval systems that can support planetary scientists in conducting semantic search, cross-source correlation, and multi-scale reasoning over heterogeneous observation archives [1]-[4].

Large Language Models (LLMs) have emerged as powerful tools for natural language understanding and generation [5], and Retrieval Augmented Generation (RAG) has been established as a standard paradigm for grounding LLM responses in external knowledge bases [6]-[8]. By dynamically retrieving relevant documents and conditioning generation on retrieved context, RAG effectively mitigates the hallucination problem inherent in LLMs and enables knowledge-intensive question answering. The synergy between LLMs and Knowledge Graphs (KGs) has further advanced retrieval performance through structured knowledge representation, achieving notable improvements in multi-hop reasoning, credibility assessment, and interpretability [9]-[13].

Nevertheless, deploying RAG systems for planetary science knowledge retrieval introduces domain-specific complexities that fundamentally challenge existing frameworks. Unlike conventional multi-source retrieval scenarios (e.g., integrating flight records, financial reports, or web documents), planetary observation data possesses two distinctive characteristics. First, all data sources are spatially grounded: each observation is anchored to a specific spatial footprint on the Martian surface, a temporal acquisition window parameterized by Solar Longitude (L_s), and instrument-specific parameters such as spectral bands and spatial resolution. The relevance between two observations is therefore governed not merely by textual semantic similarity, but primarily by physical spatial proximity, temporal co-occurrence, and cross-resolution complementarity. Second, inter-source inconsistencies in planetary science are not exclusively indicative of data errors or model hallucinations; rather, they frequently arise as inherent consequences of multi-platform, multi-scale observation and may encode critical scientific discoveries — such as subsurface geological evolution revealed by discrepancies between orbital spectroscopy and in-situ drilling results.

Recent advances in multi-source RAG, exemplified by MultiRAG [14], have made significant progress in addressing data sparsity and inter-source inconsistency through multi-source line graphs and multi-level confidence computation. However, when confronted with planetary spatial data, these methods encounter two structural bottlenecks that cannot be resolved through parameter tuning alone.

Building upon the analysis of existing multi-source RAG limitations [14]-[16] in the context of planetary science, we identify the following failure modes that are unique to spatially grounded, physically observed multi-source data:

  1. Spatial topology distortion: When multi-source observations share no common textual entities but are spatially co-located, discrete line graphs fail to establish connectivity, resulting in fragmented retrieval.

  2. Scale hierarchy collapse: Observations at different spatial resolutions (e.g., 0.3 m vs. 460 m) exhibit a natural hierarchical containment structure that flat graph topologies cannot represent, leading to loss of cross-resolution context during aggregation.

  3. Scientifically valuable conflict suppression: Confidence-based conflict filtering indiscriminately eliminates disagreeing nodes, destroying observational evidence that may indicate genuine geological phenomena such as subsurface mineral heterogeneity.

These failure modes trace back to two fundamental scientific problems:

Problem 1: Discrete Representation Failure for Continuous Spatiotemporal Topology. Existing multi-source knowledge aggregation methods, such as multi-source line graphs [14], rely on discrete text entities and explicit semantic associations to construct graph topology. However, planetary science data is intrinsically embedded in continuous Euclidean physical space. Attempting to encode continuous spatial proximity and directional relationships within traditional discrete graph structures inevitably triggers an edge explosion problem — k co-located spatial entities require \binom{k}{2} = O(k^2) pairwise spatial proximity edges — thereby destroying the optimizations that existing graph models achieve for data sparsity. The discrete logical graph structure thus constitutes a structural bottleneck constraining planetary spatial reasoning capabilities, unable to bridge the chasm between physical continuity and semantic discreteness.

Problem 2: Fundamental Conflict Between Scientific Cognitive Divergence and Traditional De-Falsification Mechanisms. The core assumption underlying existing multi-source RAG frameworks is that inter-source data inconsistency typically originates from misinformation or model hallucinations, and therefore relies on multi-level confidence computation to eliminate conflicting nodes [14], [17]. However, in deep-space exploration scenarios, the absence of absolute ground truth means that different observation platforms (e.g., orbiters versus rovers), due to differences in observation scale, penetration depth, and instrumental principles, often produce significantly conflicting results for the same target region. For instance, orbital spectrometers may detect surface hydrated minerals while in-situ drilling reveals no anomaly — a conflict arising not from data error, but from the inherent multi-dimensional nature of scientific observation, potentially harboring clues to major discoveries such as geological evolution. Applying existing conflict-filtering mechanisms indiscriminately would cause severe over-smoothing, uniformly suppressing high-value scientific anomalies and fundamentally violating the epistemological principle of deep-space exploration: preserving controversy and enabling multi-source corroboration for knowledge discovery.

To address these two fundamental challenges, we propose AreoRAG, a novel framework specifically designed for multi-source planetary spatial data retrieval augmented generation. AreoRAG introduces two synergistic innovations. First, to resolve Problem 1, we construct a Hyperbolic Spatial Hypergraph (HySH) that employs $n$-ary spatial observation hyperedges to bind co-located multi-source observations into single high-order facts, reducing edge complexity from O(k^2) to O(k). These hyperedges are embedded in hyperbolic space via the Lorentz model, where the exponential volume growth of negative-curvature geometry naturally accommodates the hierarchical scale structure of planetary observations — coarse-resolution global data resides near the origin while fine-resolution local data extends toward the boundary. Second, to resolve Problem 2, we develop a Physics-Informed Conflict Triage (PICT) mechanism that replaces the uniform conflict-filtering paradigm with a differentiated triage approach. PICT detects inter-source conflicts through cross-source interaction entropy, classifies each conflict into one of four physically grounded categories (noise, instrument-inherent, scale-dependent, temporal-evolution), and applies category-specific confidence recalibration — filtering genuine noise while provably preserving and even boosting the confidence of scientifically valuable observational disagreements. Together, HySH provides spatially faithful multi-source evidence to PICT, while PICT feeds back triage results to prioritize scientifically interesting regions in subsequent retrieval, forming a tightly coupled framework.

The contributions of this paper are summarized as follows:

  1. Hyperbolic Spatial Hypergraph Construction: We introduce HySH, a knowledge construction module that employs $n$-ary spatial observation hyperedges embedded in hyperbolic space to achieve unified spatiotemporal representation of multi-source planetary data. By coupling spatial resolution with hyperbolic radial depth via the Lorentz model, HySH faithfully preserves the hierarchical scale structure of planetary observations while eliminating edge explosion through high-order relational encoding. A resolution-aware Spatial Outward Einstein Midpoint (Spatial OEM) aggregation operator is further proposed to prevent hierarchical collapse during cross-resolution evidence fusion, with a formal guarantee of outward bias.

  2. Physics-Informed Conflict Triage: We propose PICT, a retrieval module that fundamentally redefines the role of inter-source conflict in RAG systems. Through cross-source interaction entropy for conflict detection, a physically grounded four-category conflict classification informed by observation geometry, and differentiated confidence recalibration, PICT provably prevents the over-smoothing of scientifically valuable disagreements (Anti-Over-Smoothing Guarantee) while maintaining noise-filtering capability. To the best of our knowledge, this is the first conflict-handling mechanism in RAG that explicitly distinguishes between erroneous inconsistency and scientifically meaningful observational divergence.

  3. Integrated Framework and Experimental Validation: We design the AreoRAG Prompting (ARP) algorithm that integrates HySH and PICT through three explicit coupling points: spatial alignment as a prerequisite for interaction entropy computation, radial depth difference as a resolution disparity signal for conflict classification, and triage-driven retrieval priority feedback. Extensive experiments on multi-source planetary observation datasets demonstrate that AreoRAG significantly outperforms existing multi-source RAG methods in both retrieval fidelity and scientific faithfulness, with particular advantages in scenarios involving cross-resolution reasoning and observation-grounded conflict preservation.