# MultiRAG: A Knowledge-guided Framework for Mitigating Hallucination in Multi-source Retrieval Augmented Generation Wenlong Wu \( {}^{1} \) , Haofen Wang \( {}^{2} \) , Bohan Li \( {}^{1,3,4\text{ ✉ }} \) , Peixuan Huang \( {}^{1} \) , Xinzhe Zhao \( {}^{1} \) and Lei Liang \( {}^{5} \; {}^{1} \) College of Artificial Intelligence, Nanjing University of Aeronautics and Astronautics, Key Laboratory of Brain-Machine Intelligence Technology, Ministry of Education \( {}^{2} \) College of Design &Innovation, Tongji University \( {}^{3} \) Key Laboratory of Intelligent Decision and Digital Operation, Ministry of Industry and Information Technology \( {}^{4} \) Collaborative Innovation Center of Novel Software Technology and Industrialization \( {}^{5} \) Ant Group Knowledge Graph Team Email: \{wuwenlong, bhli, peixuanh, xinzhe_zhao\}@nuaa.edu.cn carter.whfcarter@gmail.com, leywar.liang@antgroup.com Abstract-Retrieval Augmented Generation (RAG) has emerged as a promising solution to address hallucination issues in Large Language Models (LLMs). However, the integration of multiple retrieval sources, while potentially more informative, introduces new challenges that can paradoxically exacerbate hallucination problems. These challenges manifest primarily in two aspects: the sparse distribution of multi-source data that hinders the capture of logical relationships and the inherent inconsistencies among different sources that lead to information conflicts. To address these challenges, we propose MultiRAG, a novel framework designed to mitigate hallucination in multi-source retrieval-augmented generation through knowledge-guided approaches. Our framework introduces two key innovations: (1) a knowledge construction module that employs multi-source line graphs to efficiently aggregate logical relationships across different knowledge sources, effectively addressing the sparse data distribution issue; and (2) a sophisticated retrieval module that implements a multi-level confidence calculation mechanism, performing both graph-level and node-level assessments to identify and eliminate unreliable information nodes, thereby reducing hallucinations caused by inter-source inconsistencies. Extensive experiments on four multi-domain query datasets and two multi-hop QA datasets demonstrate that MultiRAG significantly enhances the reliability and efficiency of knowledge retrieval in complex multi-source scenarios. Our code is available in https://github.com/wuwenlong123/MultiRAG Index Terms-Retrieval Augmented Generation, Large Language Models, Multi-source Retrieval, Knowledge Graphs, Hallucination Mitigation ## I. INTRODUCTION Large Language Models (LLMs) have achieved remarkable success in handling a variety of natural language processing tasks, attributable to their robust capabilities in understanding and generating language and symbols [1]. In knowledge-intensive retrieval tasks, Retrieval Augmented Generation (RAG) has become a standardized solution paradigm [2]- [4]. Previous works [5]-[11] have made significant strides in addressing the inherent knowledge limitations of LLMs. By introducing external knowledge bases, it has markedly improved the accuracy and fidelity of LLM responses. However, recent studies have highlighted a significant drawback: the retrieval results of RAG are imperfect, including irrelevant, misleading, and even malicious information, ultimately leading to inaccurate LLM responses. To address these limitations, the synergy between LLMs and Knowledge Graphs (KGs) has been proposed to achieve more efficient information retrieval [12]. On one hand, KG can efficiently store data with fixed characteristics (such as temporal KGs, event KGs, etc.), thereby enhancing the processing capabilities of LLMs on specific data [13]-[20]. On the other hand, the collaboration between LLMs and KGs has significantly improved performance in multi-hop and multi-document question answering, including the credibility and interpretability of retrieval [21]. Furthermore, LLM-KG collaborative methods have also provided the latest solutions for knowledge-intensive retrieval tasks [22]-[26], propelling the deep reasoning capabilities of RAG. Nevertheless, existing frameworks still fail to account for the complexity of real-world data. Although RAG can mitigate the generation of hallucinations, these hallucinations often stem from the internal knowledge of LLMs [27]-[29]. Inconsistent information sources and unreliable retrieval methods can still lead to retrieval biases and hallucinations in LLMs. This issue becomes particularly pronounced when dealing with information retrieval tasks that involve multi-source knowledge, where hallucinations are more prominent. Research [30] indicates that approximately 70% of retrieved paragraphs do not directly contain the correct query answers but instead include information indirectly related to the answers, causing misguidance and comprehension bias in LLMs. Building upon the categorization of hallucinations in retrieval [9], we outlines the three most common types of hallucinations encountered in multi-source data retrieval: --- Wenlong Wu and Haofen Wang contributed equally to this work. Bohan Li is the corresponding author. ---  Fig. 1: Single-source Retrieval & Multi-source Retrieval 1) Inter-source data inconsistency: Discrepancies between different data sources can lead to conflicting information, causing hallucinations in LLMs. 2) Redundancy of similar data: There often exists data that is highly similar and semantically equivalent across multiple data sources, which can impose significant computational overhead on retrieval. 3) Incomplete inference paths: Forming a comprehensive inference path from different data sources is challenging. Existing retrievers often fail to capture the complete logical associations within multiple data sources. Fig. 1 vividly illustrates the differences between single-source and multi-source data retrieval through CA981 flight analysis. The sparse distribution and inconsistency of data are unique issues in multi-source data retrieval, leading to severe hallucination bias in LLMs. Against this backdrop, we focus on addressing the issue of retrieval hallucinations in multi-source data retrieval to empower knowledge-augmented generation. This work primarily explores the following two fundamental challenges: 1) Sparse Distribution of Multi-source Data: Multi-domain queries require fusing structured (SQL tables), semi-structured (JSON logs), and unstructured data (text reports). Due to the variability in data storage formats and sparsity, the connectivity between knowledge elements is low, making it difficult for RAG systems to effectively capture logical associations across sources, thereby affecting the recall rate and quality of retrieval results. 2) Inter-source Data Inconsistency: Conversely, the inherent diversity in knowledge representations across multi-source data often leads to inconsistencies in retrieved fragments. These discrepancies may induce information conflicts during retrieval processes, thereby compromising response accuracy. This challenge becomes particularly pronounced in domain-specific complex reasoning and multi-hop question answering tasks. To address these issues above, we propose MultiRAG, a novel framework designed to mitigate hallucination in multi-source retrieval augmented generation through knowledge-guided approaches. Initially, we introduce multi-source line graphs for rapid aggregation of knowledge sources to tackle issues arising from sparse data distribution. Subsequently, based on these integrated multi-source line graphs, we propose a multi-level confidence calculation method to ensure the reliability of multi-source data queries. This approach not only enhances query efficiency but also strengthens the accuracy of results, providing an effective solution for the multi-source knowledge-guided RAG. The contributions of this paper are summarized as follows: 1) Multi-source Knowledge Aggregation: In the knowledge construction module, we introduce multi-source line graphs as a data structure for rapid aggregation and reconstruction of knowledge structures from multiple query-relevant data sources. This effectively captures inter-source data dependencies within chunk texts, thereby providing a unified and centralized representation of multi-source knowledge. 2) Multi-level Confidence Calculation: In the retrieval module, we perform graph-level and node-level confidence calculations on the extracted knowledge subgraphs. The aim is to filter out and eliminate low-quality subgraphs and inconsist retrieval nodes, ultimately enhancing the quality of text embedded in context to alleviate retrieval hallucinations. 3) Experimental Validation and Performance Comparison: We conducted extensive experiments on existing multi-source retrieval datasets and two complex Q&A datasets, comparing our approach with existing state-of-the-art(SOTA) methods. This demonstrated the robustness and accuracy of our proposed method in retrieval performance. Particularly in multi-source data retrieval tasks, our method significantly outperforms other SOTA methods by more than 10%. ## II. PRELIMINARY In the field of Knowledge-Guided RAG, the primary challenges include efficiently accessing relevant knowledge and achieving reliable retrieval performance. This section introduces the core elements of our approach and precisely defines the problems we address. Let \( Q = \left\{ {{q}_{1},{q}_{2},\ldots ,{q}_{n}}\right\} \) be the set of query instances, where each \( {q}_{i} \) corresponds to a distinct query. Let \( E = \; \left\{ {{e}_{1},{e}_{2},\ldots ,{e}_{m}}\right\} \) be the set of entities in the knowledge graph, where each \( {e}_{j} \) represents an entity. Let \( R = \left\{ {{r}_{1},{r}_{2},\ldots ,{r}_{p}}\right\} \) be the set of relationships in the knowledge graph, where each \( {r}_{k} \) represents a relationship. Let \( D = \left\{ {{d}_{1},{d}_{2},\ldots ,{d}_{t}}\right\} \) be the set of documents, where each \( {d}_{l} \) represents a document. We define the knowledge-guided retrieval enhancement generation problem as follows: \[ \arg \mathop{\max }\limits_{{{d}_{i} \in D}}{LLM}\left( {{q}_{i},{d}_{i}}\right) ,\mathop{\sum }\limits_{{{e}_{j} \in E}}\mathop{\sum }\limits_{{{r}_{k} \in R}}{KG}\left( {{e}_{j},{r}_{k},{d}_{i}}\right) \tag{1} \] where \( \operatorname{LLM}\left( {{q}_{i},{d}_{l}}\right) \) denotes the score of the relevance between query \( {q}_{i} \) and document \( {d}_{l} \) assessed by the LLM, and \( \mathrm{{KG}}\left( {{e}_{j},{r}_{k},{d}_{l}}\right) \) represents the degree of match between entity \( {e}_{j} \) , relationship \( {r}_{k} \) , and document \( {d}_{l} \) . Furthermore, we optimize the knowledge construction and retrieval modules by introducing multi-source line graphs to accelerate knowledge establishment and enhance retrieval robustness. Specifically, the proposed approach is formally defined as follows: Definition 1. Multi-source data fusion. Given a set of sources \( H \) , the data \( D = \{ d, \) name, \( c, \) meta \( \} \) exists, where \( d \) represents the domain of data, \( c \) represents the content of the data file, name represents the file/attribute name, and meta represents the file metadata. Through a multi-source data fusion algorithm, we can obtain normalized data \( \widehat{D} = \; \{ {id}, d, \) name, \( {jsc}, \) meta,(cols_index) \( \} \) . Here, \( {id} \) represents the unique identifier for normalization, \( d \) indicates the domain where the data file is located, name denotes the data file name, meta denotes the file metadata, and jsc denotes the file content stored using JSON-LD. If the stored data is structured data or other data formats that can use a columnar storage model, the column index cols_index of all attributes will also be stored for rapid retrieval and query. Fig. 2 provides an example of JSON-LD format. Definition 2. Multi-source line graph [31]. Given a multi-source knowledge graph \( \mathcal{G} \) and a transformed knowledge graph \( {\mathcal{G}}^{\prime } \) (multi-source line graph, MLG), the MLG satisfies the following characteristics: 1) A node in \( {\mathcal{G}}^{\prime } \) represents a triplet. 2) There is an associated edge between any two nodes in \( {\mathcal{G}}^{\prime } \) if and only if the triples represented by these two nodes share a common node. Based on the definition, it can be inferred that MLG achieves high aggregation of related nodes, which can greatly improve the efficiency of data retrieval and accelerate subsequent retrieval and query algorithms. Definition 3. Multi-source homologous data. For any two nodes \( {v}_{1} \) and \( {v}_{2} \) in \( \mathcal{G} \) , they are defined as multi-source homologous if and only if they belong to the same retrieval candidate set in a single search. Definition 4. Homologous node and homologous subgraph. Given a set of mult-domain homologous data \( {SV} = \; {\left\{ {v}_{i}\right\} }_{i = 1}^{n} \) in the knowledge graph \( \mathcal{G} \) , we define the homologous center node as snode \( = \{ \) name, meta, num, \( C\left( v\right) \} \) , the set of homologous nodes as \( {U}_{\text{ snode }} \) , and the set of homologous edges as \( {E}_{\text{ snode }} \) . Here, name represents the common attribute name, meta denotes the identical file metadata, num indicates the number of homologous data instances, and \( C\left( v\right) \) represents the data confidence. We define the association edge between snode and \( {v}_{i} \) as \( {e}_{i} = {\left\{ {w}_{i}\right\} }_{i = 1}^{n} \) , where \( {w}_{i} \) represents the weight of node \( {v}_{i} \) in the data confidence calculation. Thus, the homologous center node and \( S\mathcal{G} \) together form the homologous subgraph subSG. --- \{ "@context": "https://json-ld.org/contexts/person.jsonld", "@id": "http://dbpedia.org/resource/John_Lennon", "name": "John Lennon", "born": "1940-10-09", "spouse": "http://dbpedia.org/resource/Cynthia_Lennon" \} --- Fig. 2: Data format of JSON-LD Definition 5. Homologous triple line graph. For all homologous subgraphs within the knowledge graph \( \mathcal{G} \) , they collectively constitute the homologous knowledge graph \( S\mathcal{G} \) . By performing a linear graph transformation on the homologous knowledge graph, we obtain the homologous triple line graph \( S{\mathcal{G}}^{\prime } \) . By constructing a homologous triple line graph, multi-source homologous data are aggregated into a single subgraph, centered around homologous nodes, enabling rapid consistency checks and conflict feedback for homologous data. Additionally, the knowledge graph contains a significant number of isolated nodes (i.e., nodes without homologous data), which are also incorporated into the homologous triple line graph. Definition 6. Candidate graph confidence and candidate node confidence. For a query \( Q\left( {q,\mathcal{G}}\right) \) on the knowledge graph \( \mathcal{G} \) , the corresponding Homologous line graph \( S{\mathcal{G}}^{\prime } \) is obtained. The candidate graph confidence is an estimation of the confidence in the candidate Homologous subgraph, assessing the overall credibility of the candidate graph; the candidate node confidence is an assessment of the confidence in individual node to determine the credibility of single attribute node. ## III. METHODOLOGY ## A. Framework of MultiRAG This section elaborates on the implementation approach of MultiRAG. As shown in Fig. 3, the first step involves segmenting and extracting multi-source data to construct the corresponding MLG, achieving preliminary aggregation of multi-source data; the second step requires reconstructing the MLG and performing subgraph extraction to identify candidate homologous subgraphs, ensuring consistent storage of homologous data for subsequent hallucination assessment; the third step involves calculating the graph-level and node-level confidence of the candidate subgraphs, eliminating low-quality nodes to enhance the credibility of the response, and returning the extracted trustworthy subgraphs to the LLM to form the final answer. Finally, integrating the aforementioned steps to form the Multi-source Line Graph Prompting algorithm, MKLGP. ## B. Multi-source Line Graph Construction The MultiRAG method initially employs an adapter structure to integrate multi-source data and standardize its storage format. For practical application scenarios, data is directly obtained from various non-homologous formats and transformed into a unified, normalized representation. Specifically, file names and metadata are parsed, and the domains to which the files belong are categorized. Subsequently, the data content is parsed and stored in JSON-LD format, thereby transforming it into linked data. Finally, unique identifiers are assigned to the data, resulting in normalized datasets.  Fig. 3: Framework of MultiRAG, including three modules. Specifically, a unique adapter is designed for each distinct data format to facilitate data parsing. Although the implementation frameworks of these adapters are largely similar, it is essential to differentiate between the parsing processes for structured, semi-structured, and unstructured data. For structured data, parsing involves storing tabular information in JSON format, where attribute variables within the file are managed using a Decomposition Storage Model (DSM). This approach enables the extraction of all attribute information for consistency checks through the use of column indices. In the case of semi-structured data, parsing corresponds to storing tree-shaped data in JSON format with multi-layer nested structures. This data format lacks column indices and does not support fast retrieval, necessitating the use of tree or graph retrieval algorithms, such as DFS, for efficient searching. Finally, for unstructured data, the focus is currently limited to textual information, which is stored directly. Subsequent steps involve leveraging LLMs for entity and relationship extraction tasks to obtain the relevant information. The final integration of multi-source data can be expressed by the following formula: \[ {D}_{\text{ Fusion }} = \mathop{\bigcup }\limits_{{i = 1}}^{n}{A}_{i}\left( {D}_{i}\right) \tag{2} \] where \( {A}_{i} \in \left\{ {{Ad}{a}_{\text{ stru }},{Ad}{a}_{\text{ semi-s }},{Ad}{a}_{\text{ unstru }}}\right\} \) , representing the adapter parsing functions for structured data, semi-structured data, and unstructured data, respectively. \( {D}_{i} \in \; \left\{ {{D}_{\text{ stru }},{D}_{\text{ semi-s }},{D}_{\text{ unstru }}}\right\} \) represents the original datasets of structured data, semi-structured data, and unstructured data, respectively. Through the parsed data \( {D}_{\text{ Fusion }} = \left\{ {{E}_{\mathrm{q}},{R}_{\mathrm{q}}}\right\} \) , we further extracts key information and links it to the knowledge graph. The knowledge construction process involves three key phases implemented through the OpenSPG framework [26], [32], in which we use the Custom Prompt module \( {}^{2} \) to integrate LLM-based knowledge extraction. For entity recognition, we utilize the ner.py prompts within the kag/builder/prompt/default directory. We first define relevant entity types in the schema. Then, by adjusting the example.input and example.output in the ner.py prompts, we guide the LLM-based SchemaFreeExtractor to identify entities accurately. In relationship extraction, the triple.py prompts play a crucial role. We define relationships in the schema and use the triple_prompt in the SchemaFreeExtractor. The instruction in triple.py ensures that the extracted Subject-Predicate-Object(SPO) triples are related to the entities in the entity_list, enabling effective relationship extraction. Regarding attribute extraction, we rely on the entity standardization prompts in std.py. After entity recognition, the std_prompt in the SchemaFreeExtractor standardizes the entities and helps in extracting their attributes. We modify the example.input, example.named_entities, and example.output in std.py according to our data characteristics to optimize the attribute extraction process. Through these steps of customizing and applying OpenSPG's prompts, we achieve efficient knowledge extraction. The following formula describes the data extraction process: \[ {KB} = \mathop{\sum }\limits_{{D}_{i}}\left( {\left\{ {{e}_{1},{e}_{2},\ldots ,{e}_{m}}\right\} \sqcup \left\{ {{r}_{1},{r}_{2},\ldots ,{r}_{n}}\right\} }\right) \tag{3} \] --- https://github.com/OpenSPG/openspg https://openspg.yuque.com/ ---  Fig. 4: Example of multi-source line graph transformation ## C. Homologous Subgraph Matching After the preliminary extraction of information, the next step is to identify the multi-source homologous data group set \( \mathcal{{SV}}s \) and the isolated point set \( \mathcal{{LV}}s \) . This process begins by initializing the unvisited node set \( {\mathcal{U}}_{\text{ unvisited }} = \mathcal{V} \) , while setting the homologous data group \( \mathcal{{SV}}s = \varnothing \) and the isolated point set \( \mathcal{L}\mathcal{V}s = \varnothing \) . By traversing all nodes and retrieving node information from various domains, for matched homologous data, construct the homologous node \( s{g}_{i} \) and its corresponding associated edge \( {e}_{i} \) , and add them to the homologous node set \( {\mathcal{U}}_{sg} \) and edge set \( {\mathcal{E}}_{sg} \) , respectively. After the traversal, add \( \left( {{\mathcal{U}}_{sg},{\mathcal{E}}_{sg}}\right) \) to \( \mathcal{S}\mathcal{V}s \) . If no homologous data is obtained after one round of traversal, add the node to the isolated point set \( \mathcal{L}\mathcal{V}s \) . After the traversal is completed, the node will be removed from the \( {\mathcal{U}}_{\text{ unvisited }} \) set. The time complexity of homologous subgraph matching is \( O\left( {n\log n}\right) \) , where \( n \) is the number of nodes in the knowledge graph \( \mathcal{G} \) . For each homologous subgraph in \( \mathcal{{SV}}s \) , homologous linear knowledge subgraph sub \( {\mathcal{{SG}}}^{\prime }{}_{i} \) is constructed by utilizing the homologous node set \( {\mathcal{U}}_{sg} \) and the homologous edge set \( {\mathcal{E}}_{sg} \) . Subsequently, all sub \( {\mathcal{{SG}}}^{\prime }{}_{i} \) and the isolated point set \( \mathcal{{LV}}s \) are aggregated to obtain the homologous linear knowledge graph \( S{\mathcal{G}}^{\prime } \) . It should be noted that \( S{\mathcal{G}}^{\prime } \) is solely used for consistency checks and retrieval queries of homologous data; other types of queries still conducts operations on the original knowledge graph \( \mathcal{G} \) . Here, we provide a simple example of a homologous triple line graph. As shown in Fig. 4, a homologous node is associated with 4 homologous data points. After transformation into a triple line graph, it forms a complete graph of order 4, indicating that the four triples are pairwise homologous. ## D. Multi-level Confidence Computing We define the candidate data from different domains obtained in a single retrieval as multi-source homologous data. These data have been extracted into a homologous line graph for temporary storage. Although targeting the same query object, they often provide inconsistent reference answers. Considering the varying retrieval errors, the multi-level confidence calculation method is adpoted in this framework. First, the confidence of individual homologous line graphs is calculated, followed by the confidence of each candidate node, to determine the final set of answer candidates. 1) Graph-Level Confidence Computing: In the first stage, a confidence calculation method based on mutual information entropy is introduced to assess the confidence of homologous line graphs. The core idea of this method is that if two nodes with the same attributes in a homologous line graph are close in content, their similarity is high, and thus their confidence is also high; conversely, if they are not, their confidence is low. Let \( \mathcal{G} \) be a homologous line graph, and \( \mathcal{N}\left( \mathcal{G}\right) \) be the set of nodes in the graph. For any two nodes \( {v}_{i},{v}_{j} \in \mathcal{N}\left( \mathcal{G}\right) \) with the same attributes, the similarity \( S\left( {{v}_{i},{v}_{j}}\right) \) between them is defined based on the calculation method of mutual information entropy. The mutual information entropy \( I\left( {{v}_{i},{v}_{j}}\right) \) measures the interdependence of the attribute content of the two nodes, and its calculation formula is: \[ I\left( {{v}_{i},{v}_{j}}\right) = \mathop{\sum }\limits_{{x \in {V}_{i}}}\mathop{\sum }\limits_{{y \in {V}_{j}}}p\left( {x, y}\right) \log \left( \frac{p\left( {x, y}\right) }{p\left( x\right) p\left( y\right) }\right) \tag{4} \] where \( {V}_{i} \) and \( {V}_{j} \) are the sets of attribute values for nodes \( {v}_{i} \) and \( {v}_{j} \) , respectively, \( p\left( {x, y}\right) \) is the joint probability distribution of \( {v}_{i} \) taking attribute value \( x \) and \( {v}_{j} \) taking attribute value \( y \) , and \( p\left( x\right) \) and \( p\left( y\right) \) are the marginal probability distributions of \( x \) and \( y \) , respectively. The similarity \( S\left( {{v}_{i},{v}_{j}}\right) \) can be defined as the normalized form of mutual information entropy to ensure that its value lies within the interval \( \left\lbrack {0,1}\right\rbrack \) : \[ S\left( {{v}_{i},{v}_{j}}\right) = \frac{I\left( {{v}_{i},{v}_{j}}\right) }{H\left( {V}_{i}\right) + H\left( {V}_{j}\right) } \tag{5} \] where \( H\left( {V}_{i}\right) \) and \( H\left( {V}_{j}\right) \) are the entropies of the attribute value sets of nodes \( {v}_{i} \) and \( {v}_{j} \) , respectively, calculated as: \[ H\left( V\right) = - \mathop{\sum }\limits_{{x \in V}}p\left( x\right) \log p\left( x\right) \tag{6} \] Subsequently, the confidence \( C\left( \mathcal{G}\right) \) of the homologous line graph \( \mathcal{G} \) can be determined by calculating the average similarity \( S\left( {{v}_{i},{v}_{j}}\right) \) of all node pairs in the graph: \[ C\left( \mathcal{G}\right) = \frac{1}{{\left| \mathcal{N}\left( \mathcal{G}\right) \right| }^{2} - \left| {\mathcal{N}\left( \mathcal{G}\right) }\right| }\mathop{\sum }\limits_{{{v}_{i} \in \mathcal{N}\left( \mathcal{G}\right) }}\mathop{\sum }\limits_{\substack{{{v}_{j} \in \mathcal{N}\left( \mathcal{G}\right) } \\ {j \neq i} }}S\left( {{v}_{i},{v}_{j}}\right) \tag{7} \] where \( \left| {\mathcal{N}\left( \mathcal{G}\right) }\right| \) denotes the number of nodes in the graph. Notably, a homologous line graph exhibiting high confidence demonstrates that its constituent nodes maintain strong attribute-level consistency across their content representations. 2) Node-Level Confidence Computing: In the second phase, the confidence of individual node \( C\left( v\right) \) is calculated, which takes into account the node's consistency, authority, and historical confidence. The following are the detailed calculation methods and formulas. Algorithm 1 Multi-level Confidence Computing Algorithm --- procedure CONFIDENCE_COMPUTING \( \left( {v, D}\right) \) \( {S}_{n}\left( v\right) \leftarrow \) Equation (8) AuthLLM \( \left( v\right) \leftarrow \) Equation (10) Authhist \( \left( v\right) \leftarrow \) Equation (11) \( A\left( v\right) \leftarrow \) Equation (9) \( C\left( v\right) \leftarrow {S}_{n}\left( v\right) + A\left( v\right) \) return \( C\left( v\right) \) end procedure procedure \( \operatorname{MCC}\left( {\mathcal{G}, Q, D}\right) \) \( \mathcal{{SV}}s \leftarrow \varnothing ,\mathcal{{LV}}s \leftarrow \varnothing \) \( {\mathcal{U}}_{\text{ unvisited }} \leftarrow V \) while \( {\mathcal{U}}_{\text{ unvisited }} \neq \varnothing \) do \( v \leftarrow \) pop a node from \( {\mathcal{U}}_{\text{ unvisited }} \) for all \( D \in D \) do if \( v \in \operatorname{Data}\left( {Q,{subS}{G}_{i}}\right) \) then \( C\left( v\right) \leftarrow \) Confidence_Computing \( \left( {v, D}\right) \) if \( C\left( v\right) > \theta \) then \( {\mathcal{U}}_{sg} \leftarrow {\mathcal{U}}_{sg} \cup \{ v\} \) \( {\mathcal{E}}_{sg} \leftarrow {\mathcal{E}}_{sg} \cup \left\{ {e}_{i}\right\} \) else \( \mathcal{{LV}}s \leftarrow \mathcal{{LV}}s \cup \{ v\} \) end if end if end for if \( {\mathcal{U}}_{sg} \neq \varnothing \) then \( \mathcal{{SV}}s \leftarrow \mathcal{{SV}}s \cup \left( {{\mathcal{U}}_{sg},{\mathcal{E}}_{sg}}\right) \) \( {\mathcal{U}}_{sg} \leftarrow \varnothing ,{\mathcal{E}}_{sg} \leftarrow \varnothing \) end if end while return \( \mathcal{{SV}}s,\mathcal{{LV}}s \) end procedure --- a) Node Consistency Score: The node consistency score \( S\left( v\right) \) reflects the consistency of the node across different data sources. We use mutual information entropy to calculate the similarity between node pairs, thereby assessing consistency. For a node \( v \) , its consistency score can be expressed as: \[ {S}_{n}\left( v\right) = \frac{1}{\left| N\left( v\right) \right| }\mathop{\sum }\limits_{{u \in N\left( v\right) }}S\left( {v, u}\right) \tag{8} \] where \( N\left( v\right) \) is the set of nodes with the same attributes as node \( v \) , and \( S\left( {v, u}\right) \) is the similarity between nodes \( v \) and \( u \) as defined in Equation 5 b) Node Authority Score: Authority score is divided into two parts: the node's authority assessed by the LLM and the node's historical authority. This score reflects the importance and authenticity of the node. Additionally, we use an expert LLM to comprehensively evaluate the authority of the node. The node’s authority score \( A\left( v\right) \) can be calculated using the following formula: \[ A\left( v\right) = \alpha \cdot {\text{ Auth }}_{LLM}\left( v\right) + \left( {1 - \alpha }\right) \cdot {\text{ Auth }}_{\text{ hist }}\left( v\right) \tag{9} \] where \( \alpha \) is a weight coefficient that balances the contributions of LLM-assessed authority and historical authority, satisfying \( 0 \leq \alpha \leq 1 \) . Algorithm 2 Multi-source Knowledge Line Graph Prompting --- procedure MKLGP \( \left( q\right) \) \( {E}_{q},{R}_{q} \leftarrow \) Logic Form Generation \( \left( q\right) \) \( {D}_{q} \leftarrow \) Multi Document Extraction \( \left( {V}_{q}\right) \) \( {\mathcal{{SG}}}^{\prime } \leftarrow \operatorname{Prompt}\left( {D}_{q}\right) \) \( \mathcal{{SV}}s,\mathcal{{LV}}s \leftarrow \operatorname{MCC}\left( {\mathcal{S}{\mathcal{G}}^{\prime }, q,{D}_{q}}\right) \) \( {C}_{\text{ nodes }},{\mathcal{G}}_{A} \leftarrow \operatorname{Prompt}\left( {\mathcal{S}\mathcal{V}s,\mathcal{L}\mathcal{V}s}\right) \) Answer \( \leftarrow \) Generating Trustworthy Answers \( \left( {{C}_{\text{ nodes }},{\mathcal{G}}_{A}}\right) \) return Answer end procedure --- Benefiting from the calculation idea of knowledge credibility in the PTCA [33], \( {\operatorname{Auth}}_{\mathrm{{LLM}}}\left( v\right) \) is assessed by the global influence and local connection strength of the node. The LLM can comprehensively calculate the credibility of knowledge by integrating the association strength between entities, entity type information, and multi-step path information. \[ {\operatorname{Autm}}_{LLM}\left( v\right) = \frac{1}{1 + {e}^{-\beta \cdot {C}_{LLM}\left( v\right) }} \tag{10} \] where \( {C}_{\mathrm{{LLM}}}\left( v\right) \) is the authority score provided by the LLM for node \( v \) is the average value of all nodes’ \( {C}_{\mathrm{{LLM}}}\left( v\right) \) , and \( \beta \) is a parameter that controls the steepness of the scoring curve. c) Historical Authority: \( {\operatorname{Auth}}_{\text{ hist }}\left( v\right) \) is an authority score based on the node's historical data. Inspired by Zhu's work [34], we expect to use the credibility of historical data sources and current query-related data for incremental estimation. \[ {\text{ Auth }}_{\text{ hist }}\left( v\right) = \frac{\mathcal{H} \cdot P{r}^{h}\left( D\right) + \mathop{\sum }\limits_{{{v}_{p} \in {D}_{v}\left\lbrack q\right\rbrack }}\Pr \left( {v}_{p}\right) }{\mathcal{H} + \left| {\operatorname{Data}\left( {q,\text{ subS }{\mathcal{G}}^{\prime }{}_{i}}\right) }\right| } \tag{11} \] where \( \mathcal{H} \) is the number of entities provided by data source \( D \) for all historical queries, \( \mathop{\Pr }\limits^{h}\left( D\right) \) is the historical credibility of data source \( D,{D}_{v}\left\lbrack q\right\rbrack \) is the set of correct answers, and Data \( \left( {q,{subS}{\mathcal{G}}^{\prime }{}_{i}}\right) \) is the query-related data obtained from the multi-source line subgraph. Ultimately, we designed the multi-level confidence computing algorithm, MCC, to calculate the credibility of the data sources in the homologous subgraph, ensuring the quality of the knowledge graph embedded in the LLM. The algorithm is shown in Algorithm 1 It should be noted that the MCC algorithm does not directly provide the final graph confidence and node confidence; these values must be obtained through prompt to achieve the ultimate results. ## E. Multi-source knowledge line graph prompting We propose the Multi-source Knowledge Line Graph Prompting (MKLGP) algorithm for multi-source data retrieval. Given a user query \( q \) , LLM is firstly employed to extract the intent, entities, and relationships from \( q \) , and generates the corresponding logical relationships. The dataset then undergoes multi-document filtering to derive text chunks, followed by constructing a Multi-source Line Graph (MLG) for knowledge aggregation. Further, it matches homogeneous subgraphs and utilizes the MCC algorithm to obtain a set of credible query nodes and isolated points \( \mathcal{{SV}}s,\mathcal{{LV}}s \) . Finally, by leveraging the prompt, the graph confidence is obtained, and the node confidence is calculated to enhance the credibility of the answer. The results are then embedded into the context of the LLM to generate a credible retrieval answer. ## IV. EXPERIMENTS This section will conduct experiments and performance analysis on the construction of homologous line graphs and the multi-level confidence calculation modules. Baseline methods will be compared with other SOTA multi-document retrieval QA methods, data fusion methods, and KBQA methods. Extensive experiments will be conducted to assess the robustness and efficiency of MultiRAG, which aims to answer the following questions. - Q1: How does the retrieval recall performance of Multi-RAG compare with other data fusion models and SOTA data retrieval models? - Q2: What are the respective impacts of data sparsity and data inconsistency on the quality of retrieval recall? - Q3: How effective are the two modules of MultiRAG individually? - Q4: How is the performance of MultiRAG in multi-hop Q&A datasets after incorporating multi-level confidence calculation? - Q5: What are the time costs of the various modules in MultiRAG? ## A. Experimental Settings a) Datasets: To validate the efficiency of multi-source line graph construction and its enhancement of retrieval performance, the article conducts multi-source data fusion experiments on four real-world benchmark datasets [35]-[37], as is shown in Table 1 (1) The movie dataset comprises movie data collected from 13 sources. (2) The book dataset includes book data from 10 sources. (3) The flight dataset gathers information on over 1200 flights from 20 sources. (4) The stock dataset collects transaction data for 1000 stock symbols from 20 sources. In the experiments, we issue 100 queries for each of the four datasets to verify their retrieval efficiency. It is noteworthy that the Movies dataset and the Flights dataset are relatively dense, while the Books dataset and the Stocks dataset are relatively sparse, which can impact the model's performance. Additionally, to validate the robustness of the MultiRAG on complex Q&A datasets, we selected two multi-hop question answering datasets, HotpotQA [38] and 2WikiMultiHopQA [39]. Both datasets are constructed based on Wikipedia documents, allowing us to utilize a consistent document corpus and retriever to provide external references for LLMs. Considering the constraints of experimental costs, we conducted a subsample analysis on 300 questions from the validation sets of each experimental dataset. TABLE I: Statistics of the datasets preprocessed
| Datasets | Data source | Sources | Entities | Relations | Queries |
| Movies | JSON(J) | 4 | 19701 | 45790 | 100 |
| KG(K) | 5 | 100229 | 264709 | ||
| CSV(C) | 4 | 70276 | 184657 | ||
| Books | JSON(J) | 3 | 3392 | 2824 | 100 |
| CSV(C) | 3 | 2547 | 1812 | ||
| XML(X) | 4 | 2054 | 1509 | ||
| Flights | CSV(C) | 10 | 48672 | 100835 | 100 |
| JSON(J) | 10 | 41939 | 89339 | ||
| Stocks | CSV(C) | 10 | 7799 | 11169 | 100 |
| JSON(J) | 10 | 7759 | 10619 |
| Datasets | Data source | Data Fusion Methods (Baseline) | SOTA Methods | Our Method | |||||||||||
| TF | LTM | IR-CoT | MDQA | ChatKBQA | FusionQuery | MCC | |||||||||
| F1/% | Time/s | F1/% | Time/s | F1/% | Time/s | F1/% | Time/s | F1/% | Time/s | F1/% | Time/s | F1/% | Time/s | ||
| Movies | J/K | 37.1 | 9717 | 41.4 | 1995 | 43.2 | 1567 | 46.2 | 1588 | 45.1 | 3809 | 53.2 | 122.4 | 52.6 | 98.3 |
| J/C | 41.9 | 7214 | 42.9 | 1884 | 45.0 | 1399 | 44.5 | 1360 | 42.7 | 3246 | 52.7 | 183.1 | 54.3 | 75.1 | |
| K/C | 37.8 | 2199 | 41.2 | 1576 | 37.6 | 1014 | 45.2 | 987 | 40.4 | 2027 | 42.5 | 141.0 | 49.1 | 86.0 | |
| J/K/C | 36.6 | 11225 | 40.8 | 2346 | 41.5 | 2551 | 49.8 | 2264 | 44.7 | 5151 | 53.6 | 137.8 | 54.8 | 157 | |
| Books | J/C | 40.2 | 1017 | 42.4 | 195.3s | 35.2 | 147.6 | 55.7 | 124.2 | 56.1 | 165.0 | 58.5 | 22.7 | 63.5 | 13.66 |
| J/X | 35.5 | 1070 | 35.6 | 277.7 | 36.1 | 178.7 | 55.1 | 115.6 | 54.7 | 200.1 | 57.9 | 20.6 | 63.1 | 13.78 | |
| C/X | 43.0 | 1033 | 44.1 | 232.6 | 42.6 | 184.5 | 57.2 | 115.6 | 55.6 | 201.4 | 60.3 | 21.5 | 64.2 | 13.54 | |
| J/C/X | 37.3 | 2304 | 41.0 | 413.2 | 40.4 | 342.6 | 56.4 | 222.6 | 57.1 | 394.1 | 59.1 | 47.0 | 66.8 | 27.4 | |
| Flights | C/J | 27.3 | 6049 | 79.1 | 14786 | 58.3 | 214.0 | 76.5 | 360 | 76.8 | 376 | 74.2 | 20.2 | 74.9 | 80 |
| Stocks | C/J | 68.4 | 2.30 | 19.2 | 1337 | 64.8 | 53.3 | 65.2 | 78.4 | 64.0 | 88.9 | 68.0 | 0.33 | 78.6 | 12.1 |
| Datasets | Source | MultiRAG | w/o MKA | w/o Graph Level | w/o Node Level | w/o MCC | ||||||||||
| F1/% | QT/s | PT/s | F1/% | QT/s | PT/s | F1/% | QT/s | PT/s | F1/% | QT/s | PT/s | F1/% | QT/s | PT/s | ||
| Movies | J/K | 52.6 | 25.7 | 62.64 | 48.2 | 2783 | 62.64 | 45.3 | 50.1 | 58.2 | 38.7 | 21.3 | 0.31 | 31.6 | 25.7 | 0.28 |
| J/C | 54.3 | 12.7 | 61.36 | 49.1 | 1882 | 61.36 | 46.8 | 28.9 | 57.4 | 40.2 | 10.5 | 0.29 | 30.5 | 12.7 | 0.29 | |
| K/C | 49.1 | 31.6 | 64.40 | 45.5 | 4233 | 64.40 | 42.7 | 65.3 | 61.8 | 35.9 | 28.4 | -0.27 | 33.1 | 31.6 | -0.29 | |
| J/K/C | 54.8 | 39.2 | 60.8 | 47.5 | 4437 | 60.8 | 48.1 | 75.6 | 56.2 | 41.5 | 35.8 | 0.30 | 34.7 | 39.2 | 0.32 | |
| Books | J/C | 63.5 | 1.19 | 2.47 | 57.1 | 11.9 | 2.47 | 55.2 | 4.7 | 2.12 | 49.8 | 0.92 | 0.18 | 43.4 | 1.19 | 0.22 |
| J/X | 63.1 | 1.22 | 2.56 | 59.3 | 11.7 | 2.62 | 54.7 | 5.1 | 2.24 | 48.3 | 0.89 | 0.19 | 42.6 | 1.22 | 0.22 | |
| C/X | 64.2 | 1.16 | 2.38 | 55.3 | 8.39 | 2.38 | 53.9 | 3.9 | 2.05 | 47.1 | 0.85 | 0.16 | 41.0 | 1.16 | 0.17 | |
| J/C/X | 66.8 | 1.31 | 3.07 | 57.2 | 15.8 | 3.08 | 59.4 | 6.3 | 2.89 | 52.7 | 1.12 | 0.21 | 36.4 | 1.31 | 0.20 | |
| Flights | C/J | 74.9 | 29.8 | 109.9 | 72.2 | NAN | 109.9 | 68.3 | 142.7 | 98.5 | 61.4 | 25.3 | 0.85 | 52.1 | 29.8 | 1.07 |
| Stocks | C/J | 78.6 | 2.72 | 5.36 | 69.6 | 450.8 | 5.36 | 72.1 | 8.9 | 4.12 | 65.3 | 1.98 | 0.15 | 45.4 | 2.72 | 0.17 |
| Method | HotpotQA | 2WikiMultiHopQA | ||
| Precision | Recall@5 | Precision | Recall@5 | |
| Standard RAG | 34.1 | 33.5 | 25.6 | 26.2 |
| GPT-3.5-Turbo+CoT | 33.9 | 47.2 | 35.0 | 45.1 |
| IRCoT | 41.6 | 41.2 | 42.3 | 40.9 |
| ChatKBQA | 47.8 | 42.1 | 46.5 | 43.7 |
| MDQA | 48.6 | 52.5 | 44.1 | 45.8 |
| RQ-RAG | 51.6 | 49.3 | 45.3 | 44.6 |
| MetaRAG | 51.1 | 49.9 | 50.7 | 52.2 |
| MultiRAG | 59.3 | 62.7 | 55.7 | 61.2 |
| Data Sources Structured Semi-structured Unstructured | CA981, PEK, JFK, Delayed, 2024-10-01 14:30 \{"flight": "CA981", "delay_reason": "Weather", "source": "AirChina"\} "Typhoon Haikui impacts PEK departures after 14:00." |
| MKA Module | Structured parsing: Flight attributes mapping LLM extraction: (CA981, DelayReason, Typhoon) @0.87 |
| MCC Module | With GCC: Graph confidence=0.71 (Threshold=0.5), Filtered: ForumUser123 (0.47) Without GCC: Unfiltered conflict=2 subgraphs |
| LLM Context | Trusted: CA981.Status=Delayed (0.89), DelayReason=Typhoon (0.85) Conflicts: ForumUser123:On-time (0.47), WeatherAPI:Clear (0.52) |
| Final Answer | Correct: "CA981 delayed until after 14:30 due to typhoon" Illucinated: "CA981 on-time with possible delay after 14:30 " |