43 lines
13 KiB
Markdown
43 lines
13 KiB
Markdown
## V. RELATED WORK
|
|
|
|
### A. Graph-Structured Retrieval Augmented Generation
|
|
|
|
Graph-based methods have become a central paradigm for enhancing the reasoning capabilities and factual grounding of Retrieval Augmented Generation (RAG) systems. Early approaches leveraged curated Knowledge Graphs (KGs) such as Wikidata and Freebase to provide structured triples or reasoning chains for LLM-based question answering [22], [27], [40]. More recently, methods that dynamically construct task-specific graphs from raw corpora have gained prominence. HippoRAG [23] draws inspiration from neurobiology to construct offline memory graphs with a neural indexing mechanism, achieving significant retrieval latency reduction. ToG 2.0 [25] introduces a graph-context co-retrieval framework that dynamically balances structured and unstructured evidence, resulting in substantial hallucination rate reduction compared to unimodal approaches. Graph-CoT [48] leverages Graph Neural Networks to establish bidirectional connections between KGs and the latent space of LLMs, reducing factual inconsistencies on KGQA benchmarks. SubGraphRAG [19] proposes a lightweight MLP-based approach that retrieves query-relevant subgraphs and encodes structural proximity through directional distance encoding, achieving state-of-the-art performance with low latency.
|
|
|
|
A critical limitation of the above methods is their reliance on binary relational facts (entity-relation-entity triples), which suffer from semantic fragmentation and path explosion when representing complex multi-entity interactions [18]. To address this, hypergraph-based RAG methods have emerged. HyperGraphRAG [25b] advances the field by natively encoding $n$-ary relational facts as hyperedges, outperforming conventional KG-based RAGs through shallower yet more expressive reasoning chains. HyperRAG [18] further introduces a trainable MLP-based retriever (HyperRetriever) that fuses structural and semantic signals for adaptive $n$-ary chain construction, achieving the highest answer accuracy on WikiTopics benchmarks. OG-RAG [34b] grounds hyperedge construction in domain-specific ontologies for more interpretable evidence aggregation, though its dependence on high-quality ontologies constrains scalability.
|
|
|
|
For multi-source scenarios, MultiRAG [14] proposes multi-source line graphs (MLG) to aggregate cross-domain knowledge and multi-level confidence computing (MCC) to filter unreliable nodes, achieving over 10% F1 improvement on sparse datasets. FusionQuery [34] enhances cross-domain retrieval precision through heterogeneous graph integration with dynamic credibility evaluation. KAG [26] provides a unified representation framework for multi-source KGs through the OpenSPG platform.
|
|
|
|
Despite this progress, all existing graph-based RAG methods — whether binary, hypergraph, or multi-source line graph — construct their topology based on discrete text entities and explicit semantic associations. None addresses the scenario where data sources are inherently embedded in continuous physical space and where inter-entity relevance is governed by spatial proximity rather than textual co-occurrence. AreoRAG bridges this gap by introducing spatial observation hyperedges embedded in hyperbolic space, enabling faithful representation of continuous spatiotemporal topology within a graph-based retrieval framework.
|
|
|
|
|
|
### B. Hyperbolic Representation Learning for Retrieval
|
|
|
|
Hyperbolic geometry has attracted increasing attention in representation learning due to its capacity to embed hierarchical, tree-like structures with low distortion [52]-[54]. Unlike Euclidean space, where volume grows polynomially with radius, hyperbolic space exhibits exponential volume growth, naturally accommodating the branching structure of taxonomies, ontologies, and scale hierarchies. Foundational work by Nickel and Kiela [52] demonstrated that Poincar\'e embeddings of WordNet hierarchies achieve superior link prediction with substantially fewer dimensions than Euclidean counterparts. Subsequent work extended hyperbolic representations to knowledge graph embedding [53], [55], molecular generation [56], and recommendation systems [57].
|
|
|
|
In the context of text retrieval, hyperbolic geometry has recently shown strong promise. HypRAG [20] introduces hyperbolic dense retrieval for RAG, developing two model variants in the Lorentz model: a fully hyperbolic transformer (HyTE-FH) and a hybrid architecture (HyTE-H). A key contribution is the Outward Einstein Midpoint (OEM), a geometry-aware pooling operator that provably preserves hierarchical structure during sequence aggregation, overcoming the radial contraction failure of naive Euclidean averaging. HypRAG achieves up to 29% gains over Euclidean baselines in context relevance on RAGBench, and demonstrates that hyperbolic representations encode document specificity through norm-based separation — with over 20% radial increase from general to specific concepts. HyperbolicRAG [58] projects embeddings into the Poincar\'e ball to encode hierarchical depth within a static knowledge graph, using dual-space retrieval that fuses Euclidean and hyperbolic rankings. HELM [59] introduces a family of hyperbolic language models that operate entirely in hyperbolic space for text generation, though not specifically targeting retrieval.
|
|
|
|
These works establish the viability of hyperbolic geometry for hierarchical text retrieval, but they exclusively address the semantic hierarchy of natural language documents (broad topics → specific entities). No existing work has applied hyperbolic geometry to represent the physical scale hierarchy of scientific observations, where the hierarchy arises not from semantic abstraction but from spatial resolution (coarse global survey → fine local imaging). AreoRAG introduces the scale-curvature correspondence principle (Proposition 1), which establishes that the resolution hierarchy of planetary remote sensing data is intrinsically hyperbolic, and couples spatial resolution with radial depth in the Lorentz model. Furthermore, we extend the OEM pooling operator with resolution-aware radial weighting (Spatial OEM, Eq. 13), ensuring that cross-resolution aggregation preserves fine-scale observational details rather than collapsing them into coarse-resolution summaries.
|
|
|
|
|
|
### C. Knowledge Conflict Detection and Resolution in RAG
|
|
|
|
Knowledge conflicts — situations where different information sources provide contradictory factual statements — pose a fundamental challenge to RAG systems [60]-[62]. Research on conflict handling can be broadly categorized into impact analysis and resolution strategies.
|
|
|
|
**Impact analysis.** Longpre et al. [60] first exposed entity-based knowledge conflicts in question answering, revealing that LLMs tend to rely on parametric memory when retrieved passages contain contradictory information. Xie et al. [61] found that LLMs are receptive to single external evidence but exhibit strong confirmation bias when presented with both supporting and conflicting information. Tan et al. [63] revealed a systematic bias toward self-generated contexts over retrieved ones, attributing this to higher query-context similarity of self-generated content. More recently, Tang et al. [21] formalized knowledge conflict in multimodal long-chain reasoning, distinguishing between input-level objective conflict and process-level effective conflict. Through probing internal representations, they revealed four key findings: (I) different conflict types are encoded as linearly separable features (>93% AUC with linear probes); (II) conflict signals concentrate in mid-to-late layers (depth localization); (III) aggregating token-level signals along trajectories robustly recovers input-level conflict types (hierarchical consistency); and (IV) reinforcing the model's implicit source preference is far easier than reversing it (directional asymmetry). These mechanistic insights provide the theoretical foundation for PICT's conflict classification approach.
|
|
|
|
**Resolution strategies.** Existing resolution methods operate at the token level or semantic level [64]-[67]. Token-level methods such as CD$^2$ [64] manipulate attention weights to suppress parametric knowledge when conflicts are detected. ASTUTE RAG [65] uses gradient-based attribution to identify and mask conflicting tokens during inference. Semantic-level methods include CK-PLUG [66], which develops adapter-based architectures for dynamic knowledge weighting, and FaithfulRAG [67], which externalizes LLMs' parametric knowledge and aligns it with retrieved context. TruthfulRAG [17] advances to factual-level resolution by constructing knowledge graphs from retrieved content, performing query-based graph retrieval, and applying entropy-based filtering to locate conflicting elements — specifically comparing retrieval-augmented entropy against parametric-only entropy ($\Delta H_p$) to identify corrective knowledge paths. MetaRAG [9] employs metacognitive strategies for hallucination mitigation through self-reflection mechanisms.
|
|
|
|
A critical and unexamined assumption shared by all existing conflict-resolution methods is that inter-source inconsistency is inherently undesirable and should be eliminated. This assumption holds in domains where authoritative ground truth exists (e.g., financial records, encyclopedic facts). However, in scientific observation scenarios — particularly deep-space exploration — the absence of absolute ground truth means that inter-source disagreements may represent legitimate multi-dimensional observations of the same phenomenon rather than errors. AreoRAG introduces a fundamentally different paradigm: Physics-Informed Conflict Triage (PICT), which classifies conflicts by their physical origin and applies differentiated processing. By replacing TruthfulRAG's parametric-vs-augmented entropy ($\Delta H_p$) with cross-source interaction entropy ($\mathcal{H}_{inter}$, Eq. 14) and incorporating physical observation parameters alongside LLM hidden-state features for four-category conflict classification (Eq. 18-19), PICT provably preserves scientifically valuable disagreements (Theorem 2) while maintaining noise-filtering capability.
|
|
|
|
|
|
### D. Intelligent Retrieval for Planetary Remote Sensing Data
|
|
|
|
Planetary remote sensing archives have grown to petabyte scale through missions such as Mars Reconnaissance Orbiter, Mars Express, Tianwen-1, Mars Science Laboratory, and Mars 2020 [1]-[4]. The primary access infrastructure — NASA's Planetary Data System (PDS) [68] and its Mars Orbital Data Explorer (ODE) [69] — provides metadata-driven search through spatial bounding box queries, temporal range filters, and instrument/product-type selectors. Similarly, CNSA's Lunar and Planetary Data Release System offers keyword-based retrieval for Chinese mission data [70]. The USGS Astrogeology Science Center maintains derived data products (DTMs, mosaics) with catalog-level metadata search [71].
|
|
|
|
However, these systems operate at the level of metadata keyword matching and do not support semantic understanding of query intent, cross-source reasoning, or natural language interaction. A scientist seeking "HiRISE images showing dust devil tracks near the equator" must manually translate this into a series of coordinate-bounded, instrument-filtered queries and visually inspect each returned product — a process that is both labor-intensive and prone to missing relevant observations cataloged under different terminology.
|
|
|
|
In the broader geospatial domain, the integration of AI with remote sensing data retrieval has gained momentum. GeoAI methods [72], [73] combine geographic information science with deep learning for tasks such as scene classification, object detection, and change detection. Recent work has explored the use of LLMs for geospatial reasoning [74], [75], including natural language interfaces for GIS queries and the interpretation of satellite imagery through vision-language models. Foundation models for remote sensing, such as those pre-trained on large-scale Earth observation data, have demonstrated the potential for cross-modal understanding [76], [77]. However, these efforts remain focused on Earth observation data and do not address the unique challenges of planetary science: the multi-platform observation geometry, the absence of ground truth for conflict adjudication, and the need for cross-resolution reasoning across vastly different spatial scales.
|
|
|
|
To the best of our knowledge, AreoRAG is the first framework that brings RAG capabilities to planetary remote sensing data retrieval. By constructing a spatially-grounded knowledge hypergraph with physics-informed conflict handling, AreoRAG transforms the planetary data retrieval paradigm from metadata keyword matching to semantic spatial reasoning, enabling natural language queries that involve spatial proximity, temporal evolution, cross-source correlation, and scientifically informed conflict interpretation.
|