From 37e814d51c0fc2485c99cade65a7820ad4d39b0f Mon Sep 17 00:00:00 2001 From: along <1015042407@qq.com> Date: Sun, 1 Feb 2026 14:55:59 +0800 Subject: [PATCH] =?UTF-8?q?=E6=B6=A6=E8=89=B2?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- mySkyline-v7.5.tex | 46 ++++++++++++++++++++++------------------------ 1 file changed, 22 insertions(+), 24 deletions(-) diff --git a/mySkyline-v7.5.tex b/mySkyline-v7.5.tex index a6a0a3a..c823c3e 100644 --- a/mySkyline-v7.5.tex +++ b/mySkyline-v7.5.tex @@ -177,7 +177,7 @@ % % paper title % can use linebreaks \\ within to get better formatting as desired -\title{An I/O efficient approach for concurrent spatio-temporal range retrievals over large-scale remote sensing image data} +\title{An I/O-Efficient Approach for Concurrent Spatio-Temporal Range Retrievals over Large-Scale Remote Sensing Image Data} % % % author names and IEEE memberships @@ -258,7 +258,7 @@ \IEEEcompsoctitleabstractindextext{% \begin{abstract} %\boldmath -High-performance remote sensing analytics workflows require ingesting and retrieving massive image archives to support real-time spatio-temporal applications. While modern systems utilize window-based I/O reading to reduce data transfer, they face a dual bottleneck: the prohibitive overhead of runtime geospatial computations caused by the decoupling of logical indexing from physical storage, and severe storage-level I/O contention triggered by uncoordinated concurrent reads. To address these limitations, we present a comprehensive I/O-aware retrieval approach based on a novel "Index-as-an-Execution-Plan" paradigm. We introduce a dual-layer inverted index that serves as an I/O planner, pre-materializing grid-to-pixel mappings to completely eliminate runtime geometric calculations. Furthermore, we design a hybrid concurrency-aware I/O coordination protocol that adaptively integrates Calvin-style deterministic ordering with optimistic execution, effectively converting I/O contention into request merging opportunities. To handle fluctuating workloads, we incorporate a Surrogate-Assisted Genetic Multi-Armed Bandit (SA-GMAB) for automatic parameter tuning. Evaluated on a distributed cluster with martian datasets, the experimental results indicate that (1) I/O-aware indexing reduces retrieval latency by an order of magnitude, (2) hybrid concurrency-aware I/O coordination achieves a 54x speedup under high contention through request merging and automates optimal mode switching, and (3) SA-GMAB has the fastest convergence speed and recovers from workload shifts $2\times$ faster than TunIO. +High-performance remote sensing analytics workflows require ingesting and retrieving massive image archives to support real-time spatio-temporal applications. While modern systems utilize window-based I/O reading to reduce data transfer, they face a dual bottleneck: (1) the prohibitive overhead of runtime geospatial computations caused by the decoupling of logical indexing from physical storage, and (2) severe storage-level I/O contention triggered by uncoordinated concurrent reads. To address these limitations, we present a comprehensive I/O-aware retrieval approach based on a novel "Index-as-an-Execution-Plan" paradigm. We introduce a dual-layer inverted index that serves as an I/O planner, pre-materializing grid-to-pixel mappings to completely eliminate runtime geometric calculations. Furthermore, we design a hybrid concurrency-aware I/O coordination protocol that adaptively integrates Calvin-style deterministic ordering with optimistic execution, effectively converting I/O contention into request merging opportunities. To handle fluctuating workloads, we incorporate a Surrogate-Assisted Genetic Multi-Armed Bandit (SA-GMAB) for automatic parameter tuning. Evaluated on a distributed cluster with martian datasets, the experimental results indicate that: (1) I/O-aware indexing reduces retrieval latency by an order of magnitude; (2) hybrid concurrency-aware I/O coordination achieves a 54x speedup under high contention through request merging and automates optimal mode switching; and (3) SA-GMAB has the fastest convergence speed and recovers from workload shifts $2\times$ faster than TunIO. \end{abstract} % IEEEtran.cls defaults to using nonbold math in the Abstract. % This preserves the distinction between vectors and scalars. However, @@ -346,7 +346,7 @@ This section describes the most salient studies of I/O-efficient spatio-temporal Efficient spatio-temporal query processing for remote sensing data has been extensively studied, with early efforts primarily focusing on metadata organization and index-level pruning in relational database systems. Traditional approaches typically extend tree-based spatial indexes, such as R-tree \cite{Strobl08PostGIS}, quadtree \cite{Tang12Quad-Tree}, and their spatio-temporal variants \cite{Simoes16PostGIST}, to organize image footprints together with temporal attributes, and are commonly implemented on relational backends (e.g., MySQL and PostgreSQL). These methods provide efficient range filtering for moderate-scale datasets, but their reliance on balanced tree structures often leads to high maintenance overhead and limited scalability as the volume of remote sensing metadata grows rapidly. With the continuous increase in data volume and ingestion rate, recent systems have gradually shifted toward grid-based spatio-temporal indexing schemes deployed on distributed NoSQL stores. By encoding spatial footprints into uniform spatial grids using GeoHash \cite{suwardi15geohash}, GeoSOT \cite{Yan21RS_manage1}, or space-filling curves \cite{hughes15geomesa}, \cite{liu24mstgi}, and combining them with temporal identifiers, these approaches enable lightweight index construction and better horizontal scalability on backends such as HBase and Elasticsearch. Such grid-based indexes can effectively reduce the candidate search space through coarse-grained pruning and are more suitable for large-scale, continuously growing remote sensing archives. \par -However, index pruning alone is insufficient to guarantee end-to-end retrieval efficiency for remote sensing workloads, where individual images are usually large and retrieval results require further pixel-level processing. To reduce the amount of raw I/O, Google Earth system \cite{gorelick17GEE} rely on tiling and multi-resolution pyramids that physically split images into small blocks. While more recent solutions leverage COG and window-based I/O to enable partial reads from monolithic image files. Frameworks such as OpenDataCube \cite{LEWIS17datacube} exploit these features to read only the image regions intersecting a retrieval window, thereby reducing unnecessary data transfer. Nevertheless, after candidate images are identified, most systems still perform fine-grained geospatial computations for each image, including coordinate transformations and precise pixel-window derivation, which may incur substantial overhead when many images are involved. +However, index pruning alone is insufficient to guarantee end-to-end retrieval efficiency for remote sensing workloads, where individual images are usually large and retrieval results require further pixel-level processing. To reduce the amount of raw I/O, Google Earth system \cite{gorelick17GEE} relies on tiling and multi-resolution pyramids that physically split images into small blocks. While more recent solutions leverage COG and window-based I/O to enable partial reads from monolithic image files. Frameworks such as OpenDataCube \cite{LEWIS17datacube} exploit these features to read only the image regions intersecting a retrieval window, thereby reducing unnecessary data transfer. Nevertheless, after candidate images are identified, most systems still perform fine-grained geospatial computations for each image, including coordinate transformations and precise pixel-window derivation, which may incur substantial overhead when many images are involved. \subsection{Concurrency Control} Concurrency control has long been studied to provide correctness and high throughput in multi-user database and storage systems, with two broad paradigms dominating the literature: deterministic scheduling \cite{Thomson12Calvin} and non-deterministic schemes \cite{Bernstein812PL}, \cite{KungR81OCC}. Hybrid approaches \cite{WangK16MVOCC}, \cite{Hong25HDCC} that adaptively combine these paradigms seek to exploit the low-conflict efficiency of deterministic execution while retaining the flexibility of optimistic techniques. More recent proposals such as OOCC target read-heavy, disaggregated settings by reducing validation and round-trips for read-only transactions, achieving low latency under OLTP-like workloads \cite{Wu25OOCC}. These CC families are primarily optimized for record- or key-level access patterns: their metrics and designs emphasize transaction latency, abort rates, and throughput under workloads with small, well-defined read/write sets. @@ -379,7 +379,7 @@ Definition~1 (Spatio-temporal Remote Sensing Image). A remote sensing image $R$ where $id$ is the unique identifier; $\Omega = [0, W] \times [0, H]$ denotes the pixel coordinate space; $\mathcal{D}$ represents the raw pixel data; and $t$ is the temporal validity interval. The image is associated with a spatial footprint $MBR(R)$ in the global coordinate reference system. \par -Definition 2 (Spatio-temporal Range Retrieval). Given a dataset $\mathbb{R}$, a retrieval $Q$ is defined by a spatio-temporal predicate $Q = \langle S, T \rangle$, where $S$ is the spatial bounding box and $T$ is the time interval. The retrieval result set $\mathcal{R}_Q$ is defined as: +Definition 2 (Spatio-temporal Range Retrieval). Given a dataset $\mathbb{R}$, a retrieval query $Q$ is defined by a spatio-temporal predicate $Q = \langle S, T \rangle$, where $S$ is the spatial bounding box and $T$ is the time interval. The retrieval result set $\mathcal{R}_Q$ is defined as: \vspace{-0.05in} \begin{equation} \label{eqn:pre_st_query} @@ -492,7 +492,7 @@ $\{g_1, \ldots, g_k\}$ whose footprints intersect the retrieval bounding box. \par -Each grid cell corresponds to a unique 64-bit \textit{GridKey}, which directly matches the primary key of the G2I table. This design has important implications: grid enumeration has constant depth and low computational cost and the resulting GridKeys can be directly used as lookup keys without any geometric refinement. Consequently, spatial key generation is reduced to simple arithmetic operations on integer grid coordinates. +Each grid cell corresponds to a unique 64-bit \textit{GridKey}, which directly matches the primary key of the G2I table. This design has important implications: grid enumeration has constant depth and low computational cost, and the resulting GridKeys can be directly used as lookup keys without any geometric refinement. Consequently, spatial key generation is reduced to simple arithmetic operations on integer grid coordinates. \par \textbf{Candidate Image Retrieval with Temporal Pruning.} @@ -619,13 +619,13 @@ A retrieval is considered complete when all windows in its I/O plan have been se Overall, this concurrency-aware I/O coordination mechanism reinterprets concurrency control as a problem of \emph{coordinating shared I/O flows}. By operating at the granularity of windowed reads and leveraging deterministic ordering and optimistic execution, it effectively reduces redundant I/O and improves scalability for multi-user spatio-temporal retrieval workloads. \section{I/O Stack Tuning}\label{sec:Tuning} -We first describe an I/O stack tuning problem and then the surrogate-assisted GMAB algorithm is proposed to solve the problem. +We first describe the I/O stack tuning problem and then propose the surrogate-assisted GMAB algorithm to solve the problem. \subsection{Formulation of Online I/O Tuning} -We study a concurrency spatio-temporal retrieval engine that processes many range retrievals at the same time. The system works on large remote sensing images stored in shared storage. Different from traditional HPC jobs or single-application I/O workloads, the system does not run one fixed job. Instead, it keeps receiving a stream of user retrievals. Each retrieval is turned into many small I/O operations that often touch overlapping regions in large raster files. +We study a concurrent spatio-temporal retrieval engine that processes many range retrievals simultaneously. The system operates on large remote sensing images stored in shared storage. Unlike traditional HPC jobs or single-application I/O workloads, the system does not run one fixed job. Instead, it continuously receives a stream of user retrievals. Each retrieval is turned into many small I/O operations that often touch overlapping regions in large raster files. \par -Let $\mathcal{Q} = \{Q_1, Q_2, \ldots, Q_N\}$ denote a stream of spatio-temporal range retrievals submitted by multiple users. Each retrieval $q$ is decomposed by the I/O-aware index into a set of grid-aligned spatial windows based on a predefined global grid system. These windows are further mapped to sub-regions of one or more large remote sensing images. In this way, every retrieval produces an I/O execution context $c= \langle W,M,S \rangle$, where $W$ describes the set of image windows to be accessed, including their sizes, spatial overlap, and distribution across images. $M$ captures window-level coordination opportunities, such as window merging, deduplication, or shared reads across concurrent retrievals. $S$ represents system-level execution decisions, including batching strategies, I/O scheduling order, and concurrency limits. Importantly, the I/O behavior of the system is not determined solely by static application code, but emerges dynamically from the interaction between retrieval workloads, execution plans, and system policies. +Let $\mathcal{Q} = \{Q_1, Q_2, \ldots, Q_N\}$ denote a stream of spatio-temporal range retrievals submitted by multiple users. Each retrieval $q$ is decomposed by the I/O-aware index into a set of grid-aligned spatial windows based on a predefined global grid system. These windows are further mapped to sub-regions of one or more large remote sensing images. In this way, every retrieval produces an I/O execution context $c= \langle W,M,S \rangle$, where $W$ describes the set of image windows to be accessed, including their sizes, spatial overlap, and distribution across images; $M$ captures window-level coordination opportunities, such as window merging, deduplication, or shared reads across concurrent retrievals; and $S$ represents system-level execution decisions, including batching strategies, I/O scheduling order, and concurrency limits. Importantly, the I/O behavior of the system is not determined solely by static application code, but emerges dynamically from the interaction between retrieval workloads, execution plans, and system policies. \par The goal of I/O tuning in this system is to optimize the performance of retrieval-induced I/O execution under continuous, concurrent workloads. We focus on minimizing the observed I/O cost per retrieval, which may be measured by metrics such as average retrieval latency, effective I/O throughput, or amortized disk read time. Let $\theta \in \varTheta$ denote a tuning configuration, where each configuration specifies a combination of system-level I/O control parameters, including window batching size, merge thresholds, queue depth, concurrency limits, and selected storage-level parameters exposed to the engine. Unlike traditional I/O tuning frameworks, the decision variables $\theta$ are applied at the retrieval execution level, rather than at application startup or compilation time. @@ -708,10 +708,10 @@ subject to practical constraints on tuning overhead and system stability. To address the online I/O tuning problem, we use a Surrogate-Assisted Genetic Multi-Armed Bandit (SA-GMAB) framework. It combines genetic search, bandit-style exploration, and a simple performance model. The goal is to handle workloads where behavior changes over time, where results are random, and where retrievals may affect each other. The main steps of this framework are shown in Algorithm~\ref{alg:sa-gmab}. \par -We first initialize the memory table and the surrogate model, and then generate an initial population of configurations (lines 1-–4). In our system, each arm is an I/O tuning configuration $\theta \in \varTheta$. A configuration is a group of I/O control parameters, such as merge thresholds, batch size, queue depth, and limits on parallel requests. The space of possible configurations is large and discrete. It is not possible to list or test all of them. So we do not fix all arms in advance. Instead, new configurations are created dynamically by genetic operators during candidate generation (line 6). Each configuration acts as a policy that tells the system how to run I/O plans during a scheduling period. +We first initialize the memory table and the surrogate model, and then generate an initial population of configurations (lines 1--4). In our system, each arm is an I/O tuning configuration $\theta \in \varTheta$. A configuration is a group of I/O control parameters, such as merge thresholds, batch size, queue depth, and limits on parallel requests. The space of possible configurations is large and discrete. It is not possible to list or test all of them. Therefore, we do not fix all arms in advance. Instead, new configurations are created dynamically by genetic operators during candidate generation (line 6). Each configuration acts as a policy that tells the system how to run I/O plans during a scheduling period. \par -When a retrieval $q_t$ with context $c_t$ arrives, the framework enters the online tuning loop (line 5). For this retrieval, a set of candidate configurations is created through selection, crossover, and mutation (line 6). For every candidate configuration, the surrogate model predicts its reward under the current context (lines 7–-9). These predicted rewards are then used to filter and keep only the top promising configurations, or those with high uncertainty (line 10). +When a retrieval $q_t$ with context $c_t$ arrives, the framework enters the online tuning loop (line 5). For this retrieval, a set of candidate configurations is created through selection, crossover, and mutation (line 6). For every candidate configuration, the surrogate model predicts its reward under the current context (lines 7--9). These predicted rewards are then used to filter and keep only the top promising configurations, or those with high uncertainty (line 10). \par When a configuration $\theta$ is used to process a retrieval $q_t$ with context $c_t$, the system observes a random performance result $Y_t=Y\left( \theta ,c_t \right)$. We define the reward as a simple transformation of I/O cost so that a higher reward means better performance. A common form is the negative latency of the retrieval, or the negative I/O time per unit work. Because other retrievals run at the same time, the reward may change even for the same configuration. Thus, many samples are needed to estimate the expected reward. @@ -720,9 +720,9 @@ When a configuration $\theta$ is used to process a retrieval $q_t$ with context For the remaining candidates, the framework computes a bandit score using both historical average reward and exploration term (lines 11--13), and then selects the configuration with the highest score (line 14). In this way, the method prefers configurations that have performed well before, but it also tries configurations that have been used only a few times. \par -The selected configuration is then applied to execute the retrieval (line 15). After execution, the system observes the performance result and converts it into a reward value (line 16). For each configuration $\theta$, the system keeps a memory entry that records how many times it has been used and its average reward. These values are updated after each execution (lines 17-–18). This keeps all historical observations instead of discarding older ones, so estimates become more accurate over time, and poor configurations are not repeatedly tried. +The selected configuration is then applied to execute the retrieval (line 15). After execution, the system observes the performance result and converts it into a reward value (line 16). For each configuration $\theta$, the system keeps a memory entry that records how many times it has been used and its average reward. These values are updated after each execution (lines 17--18). This keeps all historical observations instead of discarding older ones, so estimates become more accurate over time, and poor configurations are not repeatedly tried. -The selected configuration may also be added into the population, while poor ones may be removed (line 19). The surrogate model is retrained periodically using data stored in memory (lines 20-–22), so that its predictions follow the most recent workload. The tuning step counter is then increased (line 23), and the framework continues with the next retrieval (line 24). +The selected configuration may also be added into the population, while poor ones may be removed (line 19). The surrogate model is retrained periodically using data stored in memory (lines 20--22), so that its predictions follow the most recent workload. The tuning step counter is then increased (line 23), and the framework continues with the next retrieval (line 24). \section{Performance Evaluation}\label{sec:EXP} First, we introduce the experimental setup, covering the dataset characteristics, retrieval workload generation, and the distributed cluster environment. Then, we present the experimental results evaluating the proposed I/O-aware indexing structure, the hybrid concurrency-aware I/O coordination mechanism, and the online I/O tuning framework, respectively. @@ -800,14 +800,15 @@ All experiments are conducted on a cluster with 9 homogenous nodes (1 master nod \end{tabular} \end{table} -\subsection{Evaluating the data indexing structure} -In the following experiments, we measured the indexing on a single node in the cluster, because each node needs to perform indexing for spatial retrieval. We investigated the retrieval performance of the indexing for remote sensing images. +\subsection{Evaluating the Data Indexing Structure} +In the following experiments, we measured the indexing performance on a single node in the cluster, because each node needs to perform indexing for spatial retrieval. We investigated the retrieval performance of the indexing for remote sensing images. For comparison, we compare three representative execution schemes: -\begin{enumerate} - \item \textbf{Baseline 1 (Full-file Retrieval):} A traditional system that utilizes spatio-temporal indexing for metadata filtering but performs full-file retrieval during data extraction. - \item \textbf{Baseline 2 (Window-based I/O):} A state-of-the-art system (e.g., OpenDataCube) that supports fine-grained partial reading to minimize I/O volume, representing the theoretical optimum for data selectivity. \item \textbf{Ours (I/O-aware Indexing):} The proposed approach utilizing the dual-layer G2I and I2G inverted structure, which pre-materializes grid-to-pixel mappings to enable deterministic partial reading without runtime geometric computations. +\begin{enumerate} + \item \textbf{Baseline 1 (Full-file Retrieval):} A traditional system that utilizes spatio-temporal indexing for metadata filtering but performs full-file retrieval during data extraction. + \item \textbf{Baseline 2 (Window-based I/O):} A state-of-the-art system (e.g., OpenDataCube) that supports fine-grained partial reading to minimize I/O volume, representing the theoretical optimum for data selectivity. + \item \textbf{Ours (I/O-aware Indexing):} The proposed approach utilizing the dual-layer G2I and I2G inverted structure, which pre-materializes grid-to-pixel mappings to enable deterministic partial reading without runtime geometric computations. \end{enumerate} \subsubsection{I/O Selectivity Analysis}\label{sec:Index_exp_1} @@ -976,9 +977,6 @@ Fig.~\ref{fig:cc_exp3}(a) plots the total physical data read size. The baseline Fig.~\ref{fig:cc_exp3}(b) illustrates the number of backend I/O requests (IOPS). The baseline generates distinct I/O requests for every client, reaching 12,800 requests at $N=64$. This massive influx of small random reads overwhelms the storage scheduler, causing excessive disk head seek latency. Our system keeps the request count low and stable. Even at peak load ($N=64$), the number of physical I/O requests is suppressed to 2,000. -\par -To explain the performance gains observed above, we analyzed the internal I/O behavior. Fig.~\ref{fig:cc_exp3} compares the physical data movement against logical query demands. Note that in this experiment, Baseline A and Baseline B are grouped as a single baseline, as neither implements window-level coordination. - \par Fig.~\ref{fig:cc_exp3}(a) and Fig.~\ref{fig:cc_exp3}(b) demonstrate the Request Collapse effect. While 64 concurrent clients generate 12,800 IOPS in the baseline, our system collapses them into fewer than 600 physical operations. @@ -991,15 +989,15 @@ Fig.~\ref{fig:cc_exp3}(a) and Fig.~\ref{fig:cc_exp3}(b) demonstrate the Request \end{figure} \par -To validate the necessity of the proposed hybrid switching mechanism, we compared our adaptive approach against two static execution policies: non-deterministic and deterministic mode. We swept the spatial overlap ratio from 0 to 0.9 to capture the performance crossover zone. The workload concurrency was fixed at a moderate level ($N=16$) to highlight the sensitivity to spatial contention. +To validate the necessity of the proposed hybrid switching mechanism, we compared our adaptive approach against two static execution policies: non-deterministic mode and deterministic mode. We swept the spatial overlap ratio from 0 to 0.9 to capture the performance crossover zone. The workload concurrency was fixed at a moderate level ($N=16$) to highlight the sensitivity to spatial contention. Fig.~\ref{fig:cc_exp4} depicts the average latency curves for the three strategies. The results demonstrate that neither static policy is universally optimal, validating the design of our contention-aware switch. In the low-overlap condition, the non-deterministic mode achieves the lower latency. In contrast, the deterministic mode incurs a higher baseline latency. This performance gap is attributed to the inherent coordination overhead. Even when no conflicts exist, the deterministic scheduler must still perform time-window batching and plan sorting. Consequently, enforcing global ordering for disjoint queries introduces unnecessary serialization latency. -As $\sigma$ increases, the non-deterministic approach suffers from rapid performance deterioration, exhibiting an exponential growth trend. At $\sigma=0.4$, its latency spikes to 146 ms. This degradation is caused by severe lock and "I/O" contention and at the storage layer, where uncoordinated threads compete for the same disk blocks. Conversely, the deterministic mode exhibits stable, linear scalability thanks to its request merging capability, maintaining a latency of 110 ms at $\sigma=0.4$. +As $\sigma$ increases, the non-deterministic approach suffers from rapid performance deterioration, exhibiting an exponential growth trend. At $\sigma=0.4$, its latency spikes to 146 ms. This degradation is caused by severe lock and I/O contention at the storage layer, where uncoordinated threads compete for the same disk blocks. Conversely, the deterministic mode exhibits stable, linear scalability thanks to its request merging capability, maintaining a latency of 110 ms at $\sigma=0.4$. Our hybrid approach successfully combines the benefits of both worlds. As shown in Fig.~\ref{fig:cc_exp4}, the hybrid curve (blue line) tightly tracks the lower performance envelope of the two static policies. The system identifies the intersection point at $\sigma \approx 0.34$. When contention is below this threshold, it operates optimistically to minimize latency. Once contention exceeds it, the system automatically engages the deterministic scheduler to prevent thrashing. -\subsection{Evaluating the I/O tuning} +\subsection{Evaluating the I/O Tuning} In this section, we evaluate the effectiveness of the proposed SA-GMAB tuning framework. The experiments are designed to verify four key properties: fast convergence speed, robustness against stochastic noise, adaptability to workload shifts, and tangible end-to-end performance gains. For comparison, we benchmark against three representative tuning strategies: @@ -1056,7 +1054,7 @@ We further investigated the system's resilience in non-stationary environments. Fig.~\ref{fig:tune_exp3} illustrates the latency evolution before and after the shift. At $t=60$, the workload transition causes an immediate performance collapse across all methods, with latency spiking from a stable $\approx 50$ ms to $>300$ ms. This confirms that the configuration optimal for the previous phase is detrimental in the new environment. The GA-based method fails to adapt effectively. Post-shift, its latency hovers around $290-300$ ms. Lacking a mechanism to quickly reset or guide exploration, the genetic algorithm remains trapped in the local optima of the previous workload, exhibiting almost zero recovery within the observation window. TunIO manages to reduce latency but at a slow pace. It takes 40 steps to lower the latency from 308 ms to 134 ms ($t=100$). While the RL agent eventually learns the new reward function, the high sample complexity delays the recovery, leaving the system in a suboptimal state for a prolonged period. In contrast, SA-GMAB executes a decisive recovery. By leveraging the surrogate model to filter high-uncertainty candidates, it rapidly identifies the new optimal region. The latency drops to $\approx 88$ ms at $t=80$ and further stabilizes at $\approx 74$ ms at $t=100$. \section{Conclusions}\label{sec:Con} -This paper presents a comprehensive I/O-aware retrieval approach designed to strictly bound retrieval latency and maximize throughput for large-scale spatio-temporal analytics. By introducing the "Index-as-an-Execution-Plan" paradigm, the dual-layer inverted index bridges the semantic gap between logical indexing and physical storage, effectively shifting the computational burden from retrieval time to ingestion time. To address the scalability challenges in concurrent environments, we developed a hybrid concurrency-aware I/O coordination protocol that adaptively switches between deterministic ordering and optimistic execution based on spatial contention. Furthermore, to handle the complexity of parameter configuration in fluctuating workloads, we integrated the SA-GMAB method for online automatic I/O tuning. The experimental results indicate that (1) I/O-aware index achieves an order-of-magnitude latency reduction with negligible storage overhead, (2) the hybrid coordination protocol realizes a $54\times$ throughput improvement in high-overlap scenarios, and (3) the SA-GMAB method recovers from workload shifts $2\times$ faster than RL baselines while maximizing RoTI. Future work will explore extending the coordination protocol to support more complex analytical operators, such as distributed pixel-level join and aggregation, and integrating the tuning framework with tiered storage hierarchies to further optimize performance in cloud-native environments. +This paper presents a comprehensive I/O-aware retrieval approach designed to strictly bound retrieval latency and maximize throughput for large-scale spatio-temporal analytics. By introducing the "Index-as-an-Execution-Plan" paradigm, the dual-layer inverted index bridges the semantic gap between logical indexing and physical storage, effectively shifting the computational burden from retrieval time to ingestion time. To address the scalability challenges in concurrent environments, we developed a hybrid concurrency-aware I/O coordination protocol that adaptively switches between deterministic ordering and optimistic execution based on spatial contention. Furthermore, to handle the complexity of parameter configuration in fluctuating workloads, we integrated the SA-GMAB method for online automatic I/O tuning. The experimental results indicate that: (1) I/O-aware indexing achieves an order-of-magnitude latency reduction with negligible storage overhead; (2) the hybrid coordination protocol realizes a $54\times$ throughput improvement in high-overlap scenarios; and (3) the SA-GMAB method recovers from workload shifts $2\times$ faster than RL baselines while maximizing RoTI. Future work will explore extending the coordination protocol to support more complex analytical operators, such as distributed pixel-level join and aggregation, and integrating the tuning framework with tiered storage hierarchies to further optimize performance in cloud-native environments. % if have a single appendix: %\appendix[Proof of the Zonklar Equations]