修改语法错误

2026-01-31 15:53:28 +08:00
parent 1b8236accc
commit dc93243be6
1 changed files with 9 additions and 9 deletions
--- a/mySkyline-v7.5.tex
+++ b/mySkyline-v7.5.tex
@@ -258,7 +258,7 @@
 \IEEEcompsoctitleabstractindextext{%
 \begin{abstract}
 %\boldmath
-High-performance remote sensing analytics workflows require ingesting and retrieving massive image archives to support real-time spatio-temporal applications. While modern systems utilize window-based I/O reading to reduce data transfer, they face a dual bottleneck: the prohibitive overhead of runtime geospatial computations caused by the decoupling of logical indexing from physical storage, and severe storage-level I/O contention triggered by uncoordinated concurrent reads. To address these limitations, we present a comprehensive I/O-aware retrieval approach based on a novel "Index-as-an-Execution-Plan" paradigm. We introduce a dual-layer inverted index that serves as a I/O planner, pre-materializing grid-to-pixel mappings to completely eliminate runtime geometric calculations. Furthermore, we design a hybrid concurrency-aware I/O coordination protocol that adaptively integrates Calvin-style deterministic ordering with optimistic execution, effectively converting I/O contention into request merging opportunities. To handle fluctuating workloads, we incorporate a Surrogate-Assisted Genetic Multi-Armed Bandit (SA-GMAB) for automatic parameter tuning. Evaluated on a distributed cluster with martian datasets, the experimental results indicate that (1) I/O-aware indexing reduces retrieval latency by an order of magnitude， (2) hybrid concurrency-aware I/O coordination achieves a 54x speedup under high contention through request merging and automates optimal mode switching (3) SA-GMAB has the fastest convergence speed and recovers from workload shifts $2\times$ faster than TunIO.
+High-performance remote sensing analytics workflows require ingesting and retrieving massive image archives to support real-time spatio-temporal applications. While modern systems utilize window-based I/O reading to reduce data transfer, they face a dual bottleneck: the prohibitive overhead of runtime geospatial computations caused by the decoupling of logical indexing from physical storage, and severe storage-level I/O contention triggered by uncoordinated concurrent reads. To address these limitations, we present a comprehensive I/O-aware retrieval approach based on a novel "Index-as-an-Execution-Plan" paradigm. We introduce a dual-layer inverted index that serves as an I/O planner, pre-materializing grid-to-pixel mappings to completely eliminate runtime geometric calculations. Furthermore, we design a hybrid concurrency-aware I/O coordination protocol that adaptively integrates Calvin-style deterministic ordering with optimistic execution, effectively converting I/O contention into request merging opportunities. To handle fluctuating workloads, we incorporate a Surrogate-Assisted Genetic Multi-Armed Bandit (SA-GMAB) for automatic parameter tuning. Evaluated on a distributed cluster with martian datasets, the experimental results indicate that (1) I/O-aware indexing reduces retrieval latency by an order of magnitude, (2) hybrid concurrency-aware I/O coordination achieves a 54x speedup under high contention through request merging and automates optimal mode switching, and (3) SA-GMAB has the fastest convergence speed and recovers from workload shifts $2\times$ faster than TunIO.
 \end{abstract}
 % IEEEtran.cls defaults to using nonbold math in the Abstract.
 % This preserves the distinction between vectors and scalars. However,
@@ -326,14 +326,14 @@ To address the problems above, we propose a novel "Index-as-an-Execution-Plan" p

 	\item We propose a hybrid concurrency-aware I/O coordination protocol. This protocol adapts transaction processing principles by integrating Calvin-style deterministic ordering \cite{Thomson12Calvin} with optimistic execution \cite{Lim17OCC}. It shifts the focus from protecting database rows to coordinating shared I/O flows. This protocol dynamically switches strategies based on spatial contention, effectively converting "I/O contention" into "request merging opportunities."

-	\item We proposed an automatic I/O tuning method to improve the I/O performance of spatio-temporal range retrievals over remote sensing data. The method extends an existing AI-powered I/O tuning framework \cite{Rajesh24TunIO} based on a surrogate-assisted genetic multi-armed bandits algorithm \cite{Preil25GMAB}.
+	\item We propose an automatic I/O tuning method to improve the I/O performance of spatio-temporal range retrievals over remote sensing data. The method extends an existing AI-powered I/O tuning framework \cite{Rajesh24TunIO} based on a surrogate-assisted genetic multi-armed bandit algorithm \cite{Preil25GMAB}.
 \end{enumerate}

 \par
 The remainder of this paper is organized as follows:
 Section~\ref{sec:RW} presents the related work.
 Section~\ref{sec:DF} proposes the definition concerning the spatio-temporal range retrieval problem.
-Section~\ref{sec:Index} proposes the indexing structre.
+Section~\ref{sec:Index} proposes the indexing structure.
 Section~\ref{sec:CC} proposes the hybrid concurrency control protocol.
 Section~\ref{sec:Tuning} proposes the method of I/O stack tuning.
 Section~\ref{sec:EXP}  presents the experiments and results.
@@ -416,8 +416,8 @@ subject to:
 	\item \textit{Isolation:} Concurrent reads must effectively share I/O bandwidth without causing starvation or excessive thrashing.
 \end{enumerate}

-\section{I/O-aware Indexing stucture}\label{sec:Index}
-This section introduces the details of indexing structre for spatio-temporal range retrieval over remote sensing image data.
+\section{I/O-aware Indexing Structure}\label{sec:Index}
+This section introduces the details of indexing structure for spatio-temporal range retrieval over remote sensing image data.

 \begin{figure*}[htb]
 	\centering
@@ -801,7 +801,7 @@ All experiments are conducted on a cluster with 9 homogenous nodes (1 master nod
 \end{table}

 \subsection{Evaluating the data indexing structure}
-In the following experiments, we measured the indexing on a single node in the cluster, bacause each nodes needs to the indexing for spatial retrieval. We investigated of retrieval performance of the indexing for remote sensing images. 
+In the following experiments, we measured the indexing on a single node in the cluster, because each node needs to perform indexing for spatial retrieval. We investigated the retrieval performance of the indexing for remote sensing images. 

 For comparison, we compare three representative execution schemes: 

@@ -970,7 +970,7 @@ The medium-overlap scenario (Fig.~\ref{fig:cc_exp1}(c)) serves as a transition p
 	\label{fig:cc_exp3}
 \end{figure}

-To uncover the cause of the significant latency reduction observed in high-contention scenarios ($\sigma=0.8$),, we further analyzed the internal I/O behavior of the system. Specifically, we measured the total volume of physical data transferred from disk and the number of backend I/O requests issued to the storage system. Fig.~\ref{fig:cc_exp3} compares the physical storage pressure between the shared index baseline and ours method.
+To uncover the cause of the significant latency reduction observed in high-contention scenarios ($\sigma=0.8$), we further analyzed the internal I/O behavior of the system. Specifically, we measured the total volume of physical data transferred from disk and the number of backend I/O requests issued to the storage system. Fig.~\ref{fig:cc_exp3} compares the physical storage pressure between the shared index baseline and our method.

 Fig.~\ref{fig:cc_exp3}(a) plots the total physical data read size. The baseline exhibits a strict linear increase in data volume. At $N=64$, the system is forced to fetch 32 GB of data. This confirms that without coordination, logically overlapping queries translate into redundant physical reads, leading to severe bandwidth saturation. In contrast, our approach effectively decouples logical demand from physical execution. Although 64 clients logically request 32 GB of data, the request collapse mechanism merges these overlapping windows, resulting in only 5 GB of actual disk traffic. This 84\% reduction in data volume explains why our system avoids the bandwidth bottleneck.

@@ -1002,7 +1002,7 @@ Our hybrid approach successfully combines the benefits of both worlds. As shown
 \subsection{Evaluating the I/O tuning}
 In this section, we evaluate the effectiveness of the proposed SA-GMAB tuning framework. The experiments are designed to verify four key properties: fast convergence speed, robustness against stochastic noise, adaptability to workload shifts, and tangible end-to-end performance gains.

-For comparison, For comparison, we benchmark against three representative tuning strategies: 
+For comparison, we benchmark against three representative tuning strategies: 

 \begin{enumerate} 
 	\item \textbf{Genetic algorithm (GA):} The standard genetic algorithm to explore the configuration space, serving as the basic algorithm in the TunIO.
@@ -1056,7 +1056,7 @@ We further investigated the system's resilience in non-stationary environments.
 Fig.~\ref{fig:tune_exp3} illustrates the latency evolution before and after the shift. At $t=60$, the workload transition causes an immediate performance collapse across all methods, with latency spiking from a stable $\approx 50$ ms to $>300$ ms. This confirms that the configuration optimal for the previous phase is detrimental in the new environment. The GA-based method fails to adapt effectively. Post-shift, its latency hovers around $290-300$ ms. Lacking a mechanism to quickly reset or guide exploration, the genetic algorithm remains trapped in the local optima of the previous workload, exhibiting almost zero recovery within the observation window. TunIO manages to reduce latency but at a slow pace. It takes 40 steps to lower the latency from 308 ms to 134 ms ($t=100$). While the RL agent eventually learns the new reward function, the high sample complexity delays the recovery, leaving the system in a suboptimal state for a prolonged period. In contrast, SA-GMAB executes a decisive recovery. By leveraging the surrogate model to filter high-uncertainty candidates, it rapidly identifies the new optimal region. The latency drops to $\approx 88$ ms at $t=80$ and further stabilizes at $\approx 74$ ms at $t=100$.

 \section{Conclusions}\label{sec:Con}
-This paper presents a comprehensive I/O-aware retrieval approach designed to strictly bound retrieval latency and maximize throughput for large-scale spatio-temporal analytics. By introducing the "Index-as-an-Execution-Plan" paradigm, the dual-layer inverted index bridge the semantic gap between logical indexing and physical storage, effectively shifting the computational burden from retrieval time to ingestion time. To address the scalability challenges in concurrent environments, we developed a hybrid concurrency-aware I/O coordination protocol that adaptively switches between deterministic ordering and optimistic execution based on spatial contention. Furthermore, to handle the complexity of parameter configuration in fluctuating workloads, we integrated SA-GMAB nethod for online automatic I/O tuning. The experimental results indicate that (1) I/O-aware index achieves an order-of-magnitude latency reduction with negligible storage overhead, (2) the hybrid coordination protocol realizes a $54\times$ throughput improvement in high-overlap scenarios, (3) SA-GMAB method recovers from workload shifts $2\times$ faster than RL baselines while maximizing RoTI. Future work will explore extending the coordination protocol to support more complex analytical operators, such as distributed pixel-level join and aggregation, and integrating the tuning framework with tiered storage hierarchies to further optimize performance in cloud-native environments.
+This paper presents a comprehensive I/O-aware retrieval approach designed to strictly bound retrieval latency and maximize throughput for large-scale spatio-temporal analytics. By introducing the "Index-as-an-Execution-Plan" paradigm, the dual-layer inverted index bridges the semantic gap between logical indexing and physical storage, effectively shifting the computational burden from retrieval time to ingestion time. To address the scalability challenges in concurrent environments, we developed a hybrid concurrency-aware I/O coordination protocol that adaptively switches between deterministic ordering and optimistic execution based on spatial contention. Furthermore, to handle the complexity of parameter configuration in fluctuating workloads, we integrated the SA-GMAB method for online automatic I/O tuning. The experimental results indicate that (1) I/O-aware index achieves an order-of-magnitude latency reduction with negligible storage overhead, (2) the hybrid coordination protocol realizes a $54\times$ throughput improvement in high-overlap scenarios, and (3) the SA-GMAB method recovers from workload shifts $2\times$ faster than RL baselines while maximizing RoTI. Future work will explore extending the coordination protocol to support more complex analytical operators, such as distributed pixel-level join and aggregation, and integrating the tuning framework with tiered storage hierarchies to further optimize performance in cloud-native environments.

 % if have a single appendix:
 %\appendix[Proof of the Zonklar Equations]