删除\par符号

2026-02-02 14:45:20 +08:00
parent 54bc3a2abf
commit 0ddf4c43f7
1 changed files with 0 additions and 69 deletions
--- a/rs_retrieval.tex
+++ b/rs_retrieval.tex
@@ -63,13 +63,10 @@ Remote sensing data management, Spatio-temporal range retrievals, I/O-aware inde
 \section{Introduction}
 \IEEEPARstart{A} massive amount of remote sensing (RS) data, characterized by high spatial, temporal, and spectral resolutions, is being generated at an unprecedented speed due to the rapid advancement of Earth observation missions \cite{Ma15RS_bigdata}. For instance, NASA's AVIRIS-NG acquires nearly 9 GB of data per hour, while the EO-1 Hyperion sensor generates over 1.6 TB daily \cite{Haut21DDL_RS}. Beyond the sheer volume of data, these datasets are increasingly subjected to intensive concurrent access from global research communities and real-time emergency response systems (e.g., multi-departmental coordination during natural disasters). Consequently, modern RS platforms are required to provide not only massive storage capacity but also high-throughput retrieval capabilities to satisfy the simultaneous demands of numerous spatio-temporal analysis tasks.
 \par
 Existing RS data management systems \cite{LEWIS17datacube, Yan21RS_manage1, liu24mstgi} typically decompose a spatio-temporal range retrieval into a decoupled two-phase execution model. The first phase is the metadata filtering phase, which utilizes spatio-temporal metadata (e.g., footprints, timestamps) to identify candidate image files that intersect the retrieval predicate. Recent advancements have transitioned from traditional tree-based indexes \cite{Strobl08PostGIS, Simoes16PostGIST} to scalable distributed schemes based on grid encodings and space-filling curves, such as GeoHash \cite{suwardi15geohash}, GeoSOT \cite{Yan21RS_manage1}, and GeoMesa \cite{hughes15geomesa, Li23TrajMesa}. By leveraging these high-dimensional indexing structures, the search complexity of the first phase has been effectively reduced to $O(\log N)$ or even $O(1)$, making metadata discovery extremely efficient even for billion-scale datasets.
 \par
 The second phase is the data extraction phase, where the system reads the actual pixel data from the identified raw image files stored in distributed file systems or object stores. A critical observation in modern high-performance RS analytics is that the primary system bottleneck has fundamentally shifted from the first phase to the second. While the metadata search completes in milliseconds, the end-to-end retrieval latency is now dominated by the massive I/O overhead required to fetch, decompress, and process large-scale raw images. Traditional systems attempted to reduce I/O overhead by pre-slicing tiles and building pyramids (e.g., approaches used in Google Earth Engine \cite{gorelick17GEE} that store metadata in HBase and serve pre-tiled image pyramids), but aggressive tiling increases management complexity and produces many small files. More recent Cloud-Optimized GeoTIFF (COG) formats and COG-aware frameworks \cite{LEWIS17datacube}, \cite{riotiler25riotiler} exploit internal overviews and window-based I/O to read only the portions of files that spatially intersect a retrieval. 
 \par
 While window-based I/O effectively reduces raw data transfer, it introduces a new computational burden due to the decoupling of logical indexing from physical storage. Current systems operate on a ``Search-then-Compute-then-Read" model: after identifying candidate files, they must perform fine-grained, per-image geospatial computations at runtime to map retrieval coordinates to precise file offsets and clip boundaries. This runtime geometric resolution becomes computationally prohibitive when processing a large volume of candidate images, often negating the benefits of I/O reduction. Moreover, under concurrent workloads, the lack of coordination among these independent read requests leads to severe I/O contention and storage thrashing, rendering traditional indexing-centric optimizations insufficient for real-time applications.
 To address the aforementioned problems, we propose a novel ``Index-as-an-Execution-Plan" paradigm to bound the retrieval latency. Unlike conventional approaches that treat indexing and I/O execution as separate stages, our approach integrates fine-grained partial retrieval directly into the indexing structure. By pre-materializing the mapping between logical spatial grids and physical pixel windows, our system enables deterministic I/O planning without runtime geometric computation. To further ensure scalability, we introduce a concurrency control protocol tailored for spatio-temporal range retrievals and an automatic I/O tuning mechanism. The principal contributions of this paper are summarized as follows:
@@ -82,7 +79,6 @@ To address the aforementioned problems, we propose a novel ``Index-as-an-Executi
 	\item We propose an automatic I/O tuning method to improve the I/O performance of spatio-temporal range retrievals over RS data. The method extends an existing AI-powered I/O tuning framework \cite{Rajesh24TunIO} based on a surrogate-assisted genetic multi-armed bandit algorithm \cite{Preil25GMAB}.
 \end{enumerate}
 \par
 The remainder of this paper is organized as follows:
 Section~\ref{sec:RW} presents the related work.
 Section~\ref{sec:DF} proposes the definition concerning the spatio-temporal range retrieval problem.
@@ -98,31 +94,25 @@ This section describes the most salient studies of I/O-efficient spatio-temporal
 \subsection{I/O-Efficient Spatio-Temporal Retrieval Processing}
 Efficient spatio-temporal query processing for RS data has been extensively studied, with early efforts primarily focusing on metadata organization and index-level pruning in relational database systems. Traditional approaches typically extend tree-based spatial indexes, such as R-tree \cite{Strobl08PostGIS}, quadtree \cite{Tang12Quad-Tree}, and their spatio-temporal variants \cite{Simoes16PostGIST}, to organize image footprints together with temporal attributes, and are commonly implemented on relational backends (e.g., MySQL and PostgreSQL). These methods provide efficient range filtering for moderate-scale datasets, but their reliance on balanced tree structures often leads to high maintenance overhead and limited scalability as the volume of remote sensing metadata grows rapidly. With the continuous increase in data volume and ingestion rate, recent systems have gradually shifted toward grid-based spatio-temporal indexing schemes deployed on distributed NoSQL stores. By encoding spatial footprints into uniform spatial grids \cite{suwardi15geohash, Yan21RS_manage1} or space-filling curves \cite{liu24mstgi, Yang24GridMesa} and combining them with temporal identifiers, these approaches enable lightweight index construction and better horizontal scalability on backends such as HBase and Elasticsearch. Such grid-based indexes can effectively reduce the candidate search space through coarse-grained pruning and are more suitable for large-scale, continuously growing remote sensing archives. 
 \par
 However, index pruning alone is insufficient to guarantee end-to-end retrieval efficiency for remote sensing workloads, where individual images are usually large and retrieval results require further pixel-level processing. To reduce the amount of raw I/O, Google Earth Engine \cite{gorelick17GEE} relies on tiling and multi-resolution pyramids that physically split images into small blocks. While more recent solutions leverage COG and window-based I/O to enable partial reads from monolithic image files, frameworks such as OpenDataCube \cite{LEWIS17datacube} exploit these features to read only the image regions intersecting a retrieval window, thereby reducing unnecessary data transfer. Nevertheless, after candidate images are identified, most systems still perform fine-grained geospatial computations for each image, including coordinate transformations and precise pixel-window derivation, which may incur substantial overhead when many images are involved.
 \subsection{Concurrency Control}
 Concurrency control has long been studied to provide correctness and high throughput in multi-user database and storage systems, with two broad paradigms dominating the literature: deterministic scheduling \cite{Thomson12Calvin, hong2025deterministic} and non-deterministic schemes \cite{Bernstein812PL}, \cite{KungR81OCC}. Hybrid approaches \cite{WangK16MVOCC}, \cite{Hong25HDCC} that adaptively combine these paradigms seek to exploit the low-conflict efficiency of deterministic execution while retaining the flexibility of optimistic techniques. More recent proposals, such as OOCC \cite{Wu25OOCC}, target read-heavy, disaggregated settings by reducing validation and round-trips for read-only transactions, achieving low latency under OLTP-like workloads. These methods are primarily optimized for record- or key-level access patterns: their metrics and designs emphasize transaction latency, abort rates, and throughput under workloads with small, well-defined read/write sets.
 \par
 Overall, existing concurrency control mechanisms are largely designed around transaction-level correctness and throughput, assuming record- or key-based access patterns and treating storage I/O as a black box. Their optimization objectives rarely account for I/O amplification or fine-grained storage contention induced by concurrent range retrievals. Consequently, these approaches are ill-suited for data-intensive spatio-temporal workloads, where coordinating overlapping window reads and mitigating storage-level interference are critical to achieving scalable performance under multi-user access.
 \subsection{I/O Performance Tuning in Storage Systems}
 I/O performance tuning has been extensively studied in the context of HPC and data-intensive storage systems, where complex multi-layer I/O stacks expose a large number of tunable parameters. These parameters span different layers, including application-level I/O libraries, middleware, and underlying storage systems, and their interactions often lead to highly non-linear performance behaviors. As a result, manual tuning is time-consuming and error-prone, motivating a wide range of auto-tuning approaches \cite{Peng26IOsurvey}.
 \par
 Several studies focus on improving the efficiency of the tuning pipeline itself by reformulating the search space or optimization objectives. Chen et al. \cite{Chen21Tuning1} proposed a meta multi-objectivization model that introduces auxiliary performance objectives to mitigate premature convergence to local optima. While such techniques can improve optimization robustness, they are largely domain-agnostic and do not explicitly account for the characteristics of I/O-intensive workloads. Other works, such as the contextual bandit-based approach by Bez et al. \cite{Bez20TuningLayer}, optimize specific layers of the I/O stack (e.g., I/O forwarding) by exploiting observed access patterns. However, these methods are primarily designed for administrator-level tuning and target isolated components rather than end-to-end application I/O behavior \cite{Yang22end-IO}.
 \par
 User-level I/O tuning has also been explored, most notably by H5Tuner \cite{Behzad13HDF5}, which employs genetic algorithms to optimize the configuration of the HDF5 I/O library. Although effective for single-layer tuning, H5Tuner does not consider cross-layer interactions and lacks mechanisms for reducing tuning cost, such as configuration prioritization or early stopping.
 \par
 More recently, TunIO \cite{Rajesh24TunIO} proposed an AI-powered I/O tuning framework that explicitly targets the growing configuration spaces of modern I/O stacks. TunIO integrates several advanced techniques, including I/O kernel extraction, smart selection of high-impact parameters, and reinforcement learning–driven early stopping, to balance tuning cost and performance gain across multiple layers. Despite its effectiveness, TunIO and related frameworks primarily focus on single-application or isolated workloads, assuming stable access patterns during tuning. Retrieval-level I/O behaviors, such as fine-grained window access induced by spatio-temporal range retrievals, as well as interference among concurrent users, are generally outside the scope of existing I/O tuning approaches \cite{Wang26RethinkingTuning}.
 \section{Definition}\label{sec:DF}
 This section formalizes the spatio-temporal range retrieval problem and establishes the cost models for retrieval execution. We assume a distributed storage environment where large-scale remote sensing images are stored as objects or files.
 \par
 Definition 1 (Spatio-temporal Remote Sensing Image). A remote sensing image $R$ is defined as a tuple:
 \vspace{-0.05in}
 \begin{equation}
@@ -131,7 +121,6 @@ Definition 1 (Spatio-temporal Remote Sensing Image). A remote sensing image $R$
 \end{equation}
 where $id$ is the unique identifier; $\Omega = [0, W] \times [0, H]$ denotes the pixel coordinate space; $\mathcal{D}$ represents the raw pixel data; and $t$ is the temporal validity interval. The image is associated with a spatial footprint $MBR(R)$ in the global coordinate reference system.
 \par
 Definition 2 (Spatio-temporal Range Retrieval). Given a dataset $\mathbb{R}$, a retrieval query $Q$ is defined by a spatio-temporal predicate $Q = \langle S, T \rangle$, where $S$ is the spatial bounding box and $T$ is the time interval. The retrieval result set $\mathcal{R}_Q$ is defined as:
 \vspace{-0.05in}
 \begin{equation}
@@ -139,24 +128,19 @@ Definition 2 (Spatio-temporal Range Retrieval). Given a dataset $\mathbb{R}$, a
 	\mathcal{R}_Q=R\in \mathbb{R}\mid MBR\left( R \right) \cap S\ne \emptyset \land R.t\cap T\ne \emptyset .
 \end{equation}
 \par
 For each $R \in \mathcal{R}_Q$, the system must return the pixel matrix corresponding to the intersection region $MBR(R) \cap S$.
 \par
 Definition 3 (Retrieval Execution Cost Model). The execution latency of a retrieval $Q$, denoted as $Cost(Q)$, is composed of two phases: metadata filtering and data extraction.
 \begin{equation}
 	\label{eqn:cost_total}
 	Cost\left( Q \right) =C_{meta}\left( Q \right) +\sum_{R\in \mathcal{R}_Q}{\left( C_{geo}\left( R,Q \right) +C_{io}\left( R,Q \right) \right)}.
 \end{equation}
 \par
 Here, $C_{meta}(Q)$ is the cost of identifying candidate images $\mathcal{R}_Q$ using indices. The data extraction cost for each image consists of two components: geospatial computation cost ($C_{geo}$) and I/O access cost ($C_{io}$). $C_{geo}$ is the CPU time required to calculate the pixel-to-geographic mapping, determine the exact read windows (offsets and lengths), and handle boundary clipping. In window-based partial reading schemes, this cost is non-negligible due to the complexity of coordinate transformations. $C_{io}$ is the latency to fetch the actual binary data from storage.
 \par
 Definition 4 (Concurrent Spatio-temporal Retrievals). Let $\mathcal{Q} = \{Q_1, Q_2, \ldots, Q_N\}$ denote a set of spatio-temporal range retrievals issued concurrently by multiple users.
 Each retrieval $Q_i$ independently specifies a spatio-temporal window $\langle S_i, T_i \rangle$ and may overlap with others in both spatial and temporal dimensions. Concurrent execution of $\mathcal{Q}$ may induce overlapping partial reads over the same images or image regions, leading to redundant I/O and storage-level contention if retrievals are processed independently.
 \par
 \textbf{Problem Statement (Latency-Optimized Concurrent Retrieval Processing).} Given a dataset $\mathbb{R}$ and a concurrent workload $\mathcal{Q}$, the objective is to minimize the total execution latency:
 \vspace{-0.05in}
 \begin{equation}
@@ -187,27 +171,20 @@ This section introduces the details of the indexing structure for spatio-tempora
 \end{figure}
 \subsection{Index schema design}
 \par
 To enable I/O-efficient spatio-temporal query processing, we first decompose the global spatial domain into a uniform grid that serves as the basic unit for query pruning and data access coordination. Specifically, we adopt a fixed-resolution global tiling scheme based on the Web Mercator (or EPSG:4326) coordinate system, using zoom level 14 to partition the Earth’s surface into fine-grained grid cells (experiments show that the 14-level grid has the highest indexing efficiency, as discussed in Section~\ref{sec:Index_exp_3}). This resolution strikes a practical balance between spatial selectivity and index size: finer levels would significantly increase metadata volume and maintenance cost, while coarser levels would reduce pruning effectiveness and lead to unnecessary image I/O. At this scale, each grid cell typically corresponds to a spatial extent comparable to common query footprints and to the internal tiling granularity used by modern raster formats, making it well suited for partial data access.
 \par
 \textbf{Grid-to-Image Mapping (G2I).} 
 Based on the grid decomposition, we construct a grid-centric inverted index to associate spatial units with covering images. In our system, each grid cell is assigned a unique \emph{GridKey}, encoded as a 64-bit Z-order value to preserve spatial locality and enable efficient range scans in key-value stores such as HBase. The \emph{G2I table} stores one row per grid cell, where the row key is the GridKey and the value contains the list of image identifiers (ImageKeys) whose spatial footprints intersect the corresponding cell, as illustrated in Fig.~\ref{fig:index}(a).
 \par
 This grid-to-image mapping allows retrieval processing to begin with a lightweight enumeration of grid cells covered by a retrieval region, followed by direct lookups of candidate images via exact GridKey matches. By treating each grid cell as an independent spatial bucket, the G2I table provides efficient metadata-level pruning and avoids costly geometric intersection tests over large image footprints.
 \par
 However, the G2I table alone is insufficient for I/O-efficient retrieval execution. While it identifies which images are relevant to a given grid cell, it does not capture how the grid cell maps to pixel regions within each image. As a result, a grid-only representation cannot directly guide partial reads and would still require per-image geospatial computations at retrieval time. Therefore, the G2I table functions as a coarse spatial filter and must be complemented by an image-centric structure that materializes the correspondence between grid cells and pixel windows, enabling fine-grained, window-based I/O.
 \par
 \textbf{Image-to-Grid Mapping (I2G).}
 To complement the grid-centric G2I table and enable fine-grained, I/O-efficient data access, we introduce an image-centric inverted structure, referred to as the Image-to-Grid mapping (I2G). In contrast to G2I, which organizes metadata by spatial grids, the I2G table stores all grid-level access information of a remote sensing image in a single row. Each image therefore occupies exactly one row in the table, significantly improving locality during retrieval execution.
 \par
 As illustrated in Fig.~\ref{fig:index}(b), the row key of the I2G table is the \emph{ImageKey}, i.e., the unique identifier of a remote sensing image. The row value is organized into three column families, each serving a distinct role in retrieval-time pruning and I/O coordination:
 \par
 \textit{Grid–Window Mapping.}
 This column family records the list of grid cells intersected by the image together with their corresponding pixel windows in the image coordinate space. Each entry has the form:
@@ -217,21 +194,16 @@ This column family records the list of grid cells intersected by the image toget
 \end{equation}
 where \textit{GridKey} identifies a grid cell at the chosen global resolution, and $W_{ImageKey\_GridKey}$ denotes the minimal pixel bounding rectangle within the image that exactly covers that grid cell.
 \par
 These precomputed window offsets allow the retrieval executor to directly issue windowed reads on large raster files without loading entire images into memory or recomputing geographic-to-pixel transformations at retrieval time. As a result, grid cells become the smallest unit of coordinated I/O, enabling precise partial reads and effective elimination of redundant disk accesses.
 \par
 \textit{Temporal Metadata.}
 To support spatio-temporal range retrievals, each image row includes a lightweight temporal column family that stores its acquisition time information, such as the sensing timestamp or time interval. This metadata enables efficient temporal filtering to be performed jointly with spatial grid matching, without consulting external catalogs or secondary indexes.
 \par
 \textit{Storage Pointer.}
 This column family contains the information required to retrieve image data from the underlying storage system. It stores a stable file identifier, such as an object key in an object store (e.g., MinIO/S3) or an absolute path in a POSIX-compatible file system. By decoupling logical image identifiers from physical storage locations, this design supports flexible deployment across heterogeneous storage backends while allowing the retrieval engine to directly access image files once relevant pixel windows have been identified.
 \par
 The I2G table offers several advantages. First, all grid-level access information for the same image is co-located in a single row, avoiding repeated random lookups and improving cache locality during retrieval execution. Second, by materializing grid-to-window correspondences at ingestion time, the system completely avoids expensive per-retrieval geometric computations and directly translates spatial overlap into byte-range I/O requests. Third, the number of rows in the I2G table scales with the number of images rather than the number of grid cells, substantially reducing metadata volume and maintenance overhead.
 \par
 During data ingestion, the grid–window mappings are generated by projecting grid boundaries into the image coordinate system using the image’s georeferencing parameters. This process requires only lightweight affine or RPC transformations and does not involve storing explicit geometries or performing polygon clipping. As a result, the I2G structure enables efficient partial reads while keeping metadata compact and ingestion costs manageable.
 \subsection{Retrieval-time Execution}
@@ -240,16 +212,13 @@ The I/O-aware index enables efficient spatio-temporal range retrievals by direct
 $q = \langle [x_{\min}, y_{\min}, x_{\max}, y_{\max}], [t_s, t_e] \rangle$,
 the system resolves the retrieval through three consecutive stages: \emph{Grid Enumeration}, \emph{Candidate Image Retrieval with Temporal Pruning}, and \emph{Windowed Read Plan Generation}. As illustrated in Fig.~\ref{fig_ST_Query}, this execution pipeline bridges high-level retrieval predicates and low-level I/O operations in a fully deterministic manner.
 \par
 \textbf{Grid Enumeration.}
 As shown in Step~1 and Step~2 of Fig.~\ref{fig_ST_Query}, the retrieval execution starts by rasterizing the spatial footprint of $q$ into the fixed global grid at zoom level 14. Instead of performing recursive space decomposition as in quadtrees or hierarchical spatial indexes, our system enumerates the minimal set of grid cells
 $\{g_1, \ldots, g_k\}$
 whose footprints intersect the retrieval bounding box.
 \par
 Each grid cell corresponds to a unique 64-bit \textit{GridKey}, which directly matches the primary key of the G2I table. This design has important implications: grid enumeration has constant depth and low computational cost, and the resulting GridKeys can be directly used as lookup keys without any geometric refinement. Consequently, spatial key generation is reduced to simple arithmetic operations on integer grid coordinates.
 \par
 \textbf{Candidate Image Retrieval with Temporal Pruning.}
 Given the enumerated grid set $\{g_1, \ldots, g_k\}$, the retrieval processor performs a batched multi-get on the G2I table. Each G2I row corresponds to a single grid cell and stores the identifiers of all images whose spatial footprints intersect that cell. For each grid $g_i$, the lookup returns:
 \begin{equation}
@@ -257,15 +226,12 @@ Given the enumerated grid set $\{g_1, \ldots, g_k\}$, the retrieval processor pe
 	G2I[g_i] = \{ imgKey_1, \ldots, imgKey_m \}.
 \end{equation}
 \par
 All retrieved image identifiers are unioned to form the spatial candidate set
 $C_s = \bigcup_{i=1}^{k} G2I[g_i]$.
 This step eliminates the need for per-image polygon intersection tests that are commonly required in spatial databases and data cube systems.
 \par
 To incorporate the temporal constraint $[t_s, t_e]$, each candidate image in $C_s$ is further filtered using the temporal column family of the Image-to-Grid (I2G) table. Images whose acquisition time does not intersect the retrieval interval are discarded early, yielding the final candidate set $C$. This lightweight temporal pruning is performed without accessing any image data and introduces negligible overhead.
 \par
 \textbf{Windowed Read Plan Generation.}
 As shown in Step~3 of Fig.~\ref{fig_ST_Query}, the final stage translates the candidate image set into a concrete I/O plan. For each image $I \in C$, the retrieval executor issues a selective range-get on the I2G table to retrieve only the grid–window mappings relevant to the retrieval grids:
@@ -274,26 +240,20 @@ As shown in Step~3 of Fig.~\ref{fig_ST_Query}, the final stage translates the ca
 	I2G\left[ I,\{g_1,...,g_k\} \right] =\left\{ W_{I\_g_i}\mid g_i\cap I\ne \emptyset \right\} .
 \end{equation}
 \par
 Each $W_{I\_g_i}$ specifies the exact pixel window in the original raster file that corresponds to grid cell $g_i$. Since these window offsets are precomputed during ingestion, retrieval execution requires only key-based lookups and arithmetic filtering. No geographic coordinate transformation, polygon clipping, or raster–vector intersection is performed at retrieval time.
 \par
 The resulting collection of pixel windows constitutes a \emph{windowed read plan}, which can be directly translated into byte-range I/O requests against the storage backend. This approach avoids loading entire scenes and ensures that the total I/O volume is proportional to the retrieved spatial extent rather than the image size.
 \subsection{Why I/O-aware}
 The key reason our indexing design is I/O-aware lies in the fact that the index lookup results are not merely candidate identifiers, but constitute a concrete I/O access plan. Unlike traditional spatial indexes, where retrieval processing yields a set of objects that must still be fetched through opaque storage accesses, our Grid-to-Image and Image-to-Grid lookups deterministically produce the exact pixel windows to be read from disk. As a result, the logical retrieval plan and the physical I/O plan are tightly coupled: resolving a spatio-temporal predicate directly specifies which byte ranges should be accessed and which can be skipped.
 \par
 This tight coupling fundamentally changes the optimization objective. Instead of minimizing index traversal cost or result-set size, the system explicitly minimizes data movement by ensuring that disk I/O is proportional to the retrieval's spatio-temporal footprint. Consequently, the index serves as an execution-aware abstraction that bridges retrieval semantics and storage behavior, enabling predictable, bounded I/O under both single-retrieval and concurrent workloads.
 \par
 \textbf{Theoretical Cost Analysis.}
 To rigorously quantify the performance advantage, we revisit the retrieval cost model defined in Eq. (\ref{eqn:cost_total}). In traditional full-image reading systems, although the geospatial computation cost is negligible ($C_{geo} = 0$) as no clipping is performed, the I/O cost $C_{io}$ is determined by the full file size. Consequently, the total latency is entirely dominated by massive I/O overhead, rendering $C_{meta}$ (typically milliseconds) irrelevant.
 \par
 Existing window-based I/O systems (e.g., ODC or COG-aware libraries) successfully reduce the I/O cost to the size of the requested window. However, this reduction comes at the expense of a significant surge in $C_{geo}$. For every candidate image, the system must perform on-the-fly coordinate transformations and polygon clipping to calculate read offsets. When a retrieval involves thousands of images, the accumulated CPU time ($\sum C_{geo}$) becomes a new bottleneck (e.g., hundreds of milliseconds to seconds), often negating the benefits of I/O reduction (detailed quantitative comparisons are provided in Sec.~\ref{sec:Index_exp_2}).
 \par
 In contrast, our I/O-aware indexing approach fundamentally alters this trade-off. By materializing the grid-to-pixel mapping in the I2G table, we effectively shift the computational burden from retrieval time to ingestion time. Although the two-phase lookup (G2I and I2G) introduces a slight overhead compared to simple tree traversals, $C_{meta}$ remains in the order of milliseconds—orders of magnitude smaller than disk I/O latency. Since the precise pixel windows are pre-calculated and stored, the runtime geospatial computation is effectively eliminated, i.e., $C_{geo} = 0$. The system retains the minimal I/O cost characteristic of window-based approaches, fetching only relevant byte ranges. Therefore, our design achieves the theoretical minimum for both computation and I/O components within the retrieval execution critical path.
 \section{Hybrid Concurrency-Aware I/O Coordination}\label{sec:CC}
@@ -309,7 +269,6 @@ In this section, we propose a hybrid coordination mechanism that adaptively empl
 \subsection{Retrieval Admission and I/O Plan Generation}
 When a spatio-temporal range retrieval $Q$ arrives, the system first performs index-driven plan generation. The retrieval footprint is rasterized into the global grid to enumerate the intersecting grid cells. The G2I table is then consulted to retrieve the set of candidate images, followed by selective lookups in the I2G table to obtain the corresponding pixel windows.
 \par
 As a result, each retrieval is translated into an explicit \emph{I/O access plan} consisting of image–window pairs:
 \vspace{-0.05in}
 \begin{equation}
@@ -321,7 +280,6 @@ where each window $w$ denotes a concrete pixel range to be accessed via byte-ran
 \subsection{Contention Estimation and Path Selection}
 To minimize the overhead of global ordering in low-contention scenarios, the system introduces a Contention-Aware Switch. Upon the arrival of a retrieval batch $\mathcal{Q} = \{Q_1, Q_2, ..., Q_n\}$, the system first estimates the Spatial Overlap Ratio ($\sigma$) among their generated I/O plans.
 \par
 Let $A(Plan(Q_i))$ be the aggregate spatial area of all pixel windows in the I/O plan of retrieval $Q_i$. The overlap ratio $\sigma$ for a batch is defined as:
 \vspace{-0.05in}
 \begin{equation}
@@ -331,7 +289,6 @@ Let $A(Plan(Q_i))$ be the aggregate spatial area of all pixel windows in the I/O
 \end{equation}
 where $\sigma \in [0, 1]$. A high $\sigma$ indicates that multiple retrievals are competing for the same image regions, leading to high I/O amplification if executed independently.
 \par
 The system utilizes a rule-based assignment mechanism similar to HDCC \cite{Hong25HDCC} to select the execution path:
 \begin{enumerate}
 	\item Path A (Non-deterministic/OCC-style): If $\sigma < \tau$ (where $\tau$ is a configurable threshold), retrievals proceed directly to execution to maximize concurrency.
@@ -341,31 +298,23 @@ The system utilizes a rule-based assignment mechanism similar to HDCC \cite{Hong
 \subsection{Deterministic Coordinated and Non-deterministic Execution}
 When $\sigma \ge \tau$, the system switches to a deterministic path to mitigate storage-level contention and I/O amplification, as shown in Fig.~\ref{fig:cc}. To coordinate concurrent access to shared storage resources, we introduce a Global I/O Plan Queue that enforces a deterministic ordering over all admitted I/O plans. Each windowed access $(img, w)$ derived from incoming retrievals is inserted into this queue according to a predefined policy, such as FIFO based on arrival time or lexicographic ordering by $(timestamp, RetrievalID)$.
 \par
 This design is inspired by deterministic scheduling in systems such as Calvin, but differs fundamentally in its scope: the ordering is imposed on \emph{window-level I/O operations} rather than on transactions. As a result, accesses to the same image region across different retrievals follow a globally consistent order, preventing uncontrolled interleaving of reads and reducing contention at the storage layer. The deterministic ordering also provides a stable foundation for subsequent I/O coordination and sharing.
 \par
 The core of our approach lies in coordinating concurrent windowed reads at the image level. Windows originating from different retrievals may overlap spatially, be adjacent, or even be identical. Executing these requests independently would lead to redundant reads and excessive I/O amplification.
 \par
 To address this, the system performs three coordination steps within each scheduling interval. Stage 1: Global De-duplication. The system first extracts all windowed access pairs $(img, w)$ from the admitted retrievals and inserts them into a global window set ($\mathcal{W}_{total}$). If multiple retrievals $Q_1, Q_2, ..., Q_n$ request the same pixel window $w$ from image $img$, the system retains only one unique entry in $\mathcal{W}_{total}$. This stage ensures that any specific byte range is identified as a single logical requirement, effectively preventing the redundant retrieval of overlapping spatial grids. Stage 2: Range Merging. After de-duplication, the system analyzes the physical disk offsets of all unique windows in $\mathcal{W}_{total}$. Following the principle of improving access locality, windows that are physically contiguous or separated by a gap smaller than a threshold $\theta$ are merged into a single read. Stage 3: Dispatching. This stage maintains a mapping between the physical byte-offsets in the buffer and the logical window requirements of each active retrieval. Each retrieval $Q_i$ receives only the exact pixel windows $w \in Plan(Q_i)$ it originally requested. This is achieved via zero-copy memory mapping where possible, or by slicing the shared system buffer into local thread-wise structures. This ensures that while the physical I/O is shared to reduce amplification, the logical execution of each retrieval remains independent and free from irrelevant data interference.
 \par
 For example, when $Q_1$ requests grids $\{1, 2\}$ and $Q_2$ requests grids $\{2, 3\}$, Stage 1 identifies the unique requirement set $\{1, 2, 3\}$. Stage 2 then merges these into a single contiguous I/O operation covering the entire range $[1, 3]$. In Stage 3, the dispatcher identifies memory offsets corresponding to grids $1$ and $2$ within the buffer and maps these slices to the private cache of $Q_1$. For $Q_2$, similarly, the dispatcher extracts and delivers slices for grids $2$ and $3$ to $Q_2$.
 \par
 Through these mechanisms, concurrent retrievals collaboratively share I/O, and the execution unit becomes a coordinated window read rather than an isolated request. Importantly, this coordination operates entirely at the I/O planning level and does not require any form of locking or transaction-level synchronization.
 \par
 When contention remains below the threshold ($\sigma < \tau$), the system prioritizes low latency over merging efficiency by adopting an optimistic dispatch mechanism, as shown in Fig.~\ref{fig:cc}. Instead of undergoing heavy-weight sorting, I/O plans are immediately offloaded to the execution engine. By utilizing thread-local sublists, each thread independently handles its byte-range requests.
 \subsection{Optimistic Read Execution and Completion}
 Once a coordinated window read is scheduled, the system issues the corresponding byte-range I/O request immediately. Read execution is fully optimistic: there is no validation phase, no abort, and no rollback. This is enabled by the immutability of remote-sensing imagery and by the deterministic ordering of I/O plans, which together ensure consistent and repeatable read behavior.
 \par
 A retrieval is considered complete when all windows in its I/O plan have been served and the associated local processing (e.g., reprojection or mosaicking) has finished. By eliminating validation overhead and allowing read execution to proceed independently once scheduled, the system achieves low-latency retrieval completion while maintaining predictable I/O behavior under concurrency.
 \par
 Overall, this concurrency-aware I/O coordination mechanism reinterprets concurrency control as a problem of \emph{coordinating shared I/O flows}. By operating at the granularity of windowed reads and leveraging deterministic ordering and optimistic execution, it effectively reduces redundant I/O and improves scalability for multi-user spatio-temporal retrieval workloads.
 \section{I/O Stack Tuning}\label{sec:Tuning}
@@ -374,13 +323,10 @@ We first describe the I/O stack tuning problem and then propose the surrogate-as
 \subsection{Formulation of Online I/O Tuning}
 We study a concurrent spatio-temporal retrieval engine that processes many range retrievals simultaneously. The system operates on large remote sensing images stored in shared storage. Unlike traditional HPC jobs or single-application I/O workloads, the system does not run one fixed job. Instead, it continuously receives a stream of user retrievals. Each retrieval is turned into many small I/O operations that often touch overlapping regions in large raster files.
 \par
 Let $\mathcal{Q} = \{Q_1, Q_2, \ldots, Q_N\}$ denote a stream of spatio-temporal range retrievals submitted by multiple users. Each retrieval $q$ is decomposed by the I/O-aware index into a set of grid-aligned spatial windows based on a predefined global grid system. These windows are further mapped to sub-regions of one or more large remote sensing images. In this way, every retrieval produces an I/O execution context $c= \langle W,M,S \rangle$, where $W$ describes the set of image windows to be accessed, including their sizes, spatial overlap, and distribution across images; $M$ captures window-level coordination opportunities, such as window merging, deduplication, or shared reads across concurrent retrievals; and $S$ represents system-level execution decisions, including batching strategies, I/O scheduling order, and concurrency limits. Importantly, the I/O behavior of the system is not determined solely by static application code, but emerges dynamically from the interaction between retrieval workloads, execution plans, and system policies.
 \par
 The goal of I/O tuning in this system is to optimize the performance of retrieval-induced I/O execution under continuous, concurrent workloads. We focus on minimizing the observed I/O cost per retrieval, which may be measured by metrics such as average retrieval latency, effective I/O throughput, or amortized disk read time. Let $\theta \in \varTheta$ denote a tuning configuration, where each configuration specifies a combination of system-level I/O control parameters, including window batching size, merge thresholds, queue depth, concurrency limits, and selected storage-level parameters exposed to the engine. Unlike traditional I/O tuning frameworks, the decision variables $\theta$ are applied at the retrieval execution level, rather than at application startup or compilation time.
 \par
 For a given tuning configuration $\theta $ and execution context $c$, the observed I/O performance is inherently stochastic due to: interference among concurrent retrievals; shared storage contention; variability in window overlap and access locality. We model the observed performance outcome as a random variable:
 \vspace{-0.05in}
 \begin{equation}
@@ -390,7 +336,6 @@ For a given tuning configuration $\theta $ and execution context $c$, the observ
 \end{equation}
 where $f\left( \cdot \right) $ is an unknown performance function and $\epsilon$ captures stochastic noise. Moreover, as retrieval workloads evolve over time, the distribution of execution contexts $c$ may change, making the tuning problem non-stationary.
 \par
 Given a stream of retrievals $\mathcal{Q}$ and the resulting sequence of execution contexts $\left\{ c_t \right\} $, the problem is to design an online tuning strategy that adaptively selects tuning configurations $\theta _t$ for retrieval execution, so as to minimize the long-term expected I/O cost:
 \vspace{-0.05in}
 \begin{equation}
@@ -454,22 +399,16 @@ subject to practical constraints on tuning overhead and system stability.
 	}
 \end{algorithm}
 \par
 To address the online I/O tuning problem, we use a Surrogate-Assisted Genetic Multi-Armed Bandit (SA-GMAB) framework. It combines genetic search, bandit-style exploration, and a simple performance model. The goal is to handle workloads where behavior changes over time, where results are random, and where retrievals may affect each other. The main steps of this framework are shown in Algorithm~\ref{alg:sa-gmab}.
 \par
 We first initialize the memory table and the surrogate model, and then generate an initial population of configurations (lines 1--4). In our system, each arm is an I/O tuning configuration $\theta \in \varTheta$. A configuration is a group of I/O control parameters, such as merge thresholds, batch size, queue depth, and limits on parallel requests. The space of possible configurations is large and discrete. It is not possible to list or test all of them. Therefore, we do not fix all arms in advance. Instead, new configurations are created dynamically by genetic operators during candidate generation (line 6). Each configuration acts as a policy that tells the system how to run I/O plans during a scheduling period.
 \par
 When a retrieval $q_t$ with context $c_t$ arrives, the framework enters the online tuning loop (line 5). For this retrieval, a set of candidate configurations is created through selection, crossover, and mutation (line 6). For every candidate configuration, the surrogate model predicts its reward under the current context (lines 7--9). These predicted rewards are then used to filter and keep only the top promising configurations, or those with high uncertainty (line 10).
 \par
 When a configuration $\theta$ is used to process a retrieval $q_t$ with context $c_t$, the system observes a random performance result $Y_t=Y\left( \theta ,c_t \right)$. We define the reward as a simple transformation of I/O cost so that a higher reward means better performance. A common form is the negative latency of the retrieval, or the negative I/O time per unit work. Because other retrievals run at the same time, the reward may change even for the same configuration. Thus, many samples are needed to estimate the expected reward.
 \par
 For the remaining candidates, the framework computes a bandit score using both historical average reward and exploration term (lines 11--13), and then selects the configuration with the highest score (line 14). In this way, the method prefers configurations that have performed well before, but it also tries configurations that have been used only a few times.
 \par
 The selected configuration is then applied to execute the retrieval (line 15). After execution, the system observes the performance result and converts it into a reward value (line 16). For each configuration $\theta$, the system keeps a memory entry that records how many times it has been used and its average reward. These values are updated after each execution (lines 17--18). This keeps all historical observations instead of discarding older ones, so estimates become more accurate over time, and poor configurations are not repeatedly tried.
 The selected configuration may also be added into the population, while poor ones may be removed (line 19). The surrogate model is retrained periodically using data stored in memory (lines 20--22), so that its predictions follow the most recent workload. The tuning step counter is then increased (line 23), and the framework continues with the next retrieval (line 24).
@@ -506,7 +445,6 @@ We employed a large-scale, multi-source planetary remote sensing dataset derived
 \end{table}
 \subsubsection{Retrieval Workload}
 \par
 To evaluate the system performance under diverse scenarios, we developed a synthetic workload generator that simulates concurrent spatio-temporal range retrievals. The retrieval parameters are configured as follows:
 \begin{itemize}
 	\item Spatial Extent: The spatial range of retrievals follows a log-uniform distribution, ranging from small tile-level access ($0.0001\%$ of the scene) to large-scale regional mosaics ($1\%$ to $100\%$ of the scene).
@@ -581,7 +519,6 @@ For comparison, we compare three representative execution schemes:
 	\label{fig:index_exp1}
 \end{figure}
 \par
 First, we evaluated the effectiveness of data reduction by measuring the I/O selectivity, defined as the ratio of the retrieved data volume to the total file size. Fig.~\ref{fig:index_exp1} compares our method against Baseline 1 and Baseline 2. As illustrated in Fig.~\ref{fig:index_exp1}(a), Baseline 1 always reads the entire image regardless of the proportion of the intersection between the query range and the image. In contrast, both Baseline 2 and Ours significantly reduce I/O traffic by enabling partial reads. It is worth noting that our method incurs slightly higher I/O volume compared to the theoretically optimal Baseline 2. This marginal data redundancy is attributed to the grid alignment effect: our index retrieves pixel blocks based on fixed grid boundaries, whereas Baseline 2 performs precise geospatial clipping. Fig.~\ref{fig:index_exp1}(b) further presents the distribution of unnecessary data fraction. While our method introduces a small amount of over-reading due to grid padding, it successfully avoids the massive data waste observed in Baseline 1. As we will demonstrate in the next section, this slight compromise in I/O precision is a strategic trade-off that eliminates expensive runtime computations.
 \subsubsection{End-to-End Retrieval Latency}\label{sec:Index_exp_2}
@@ -604,7 +541,6 @@ First, we evaluated the effectiveness of data reduction by measuring the I/O sel
 	\label{fig:index_exp2}
 \end{figure}
 \par
 We next measured the end-to-end retrieval latency to verify whether the I/O reduction translates into time efficiency. Fig.~\ref{fig:index_exp2}(a) reports the mean and 95th percentile (P95) latency across varying retrieval footprint ratios. The results reveal three distinct performance behaviors: Baseline 1 shows a high and flat latency curve ($\approx 4500$ ms), dominated by the cost of transferring entire images. Baseline 2, despite its optimal I/O selectivity, exhibits a significant latency floor ($\approx 380$ ms for small tile-level retrievals). This overhead stems from the on-the-fly geospatial computations required to calculate precise read windows. Ours achieves the lowest latency, ranging from 34 ms to 59 ms for typical tile-level retrievals. Crucially, for small-to-medium retrievals, our method outperforms Baseline 2 by an order of magnitude. The gap between the two curves highlights the advantage of our deterministic indexing approach: by pre-materializing grid-to-window mappings, we eliminate runtime coordinate transformations. Although our I/O volume is slightly larger (as shown in Sec.~\ref{sec:Index_exp_1}), the time saved by avoiding computational overhead far outweighs the cost of transferring a few extra kilobytes of padding data.
 To empirically validate the cost model proposed in Eq.~\ref{eqn:cost_total}, we further decomposed the retrieval latency into three components: metadata lookup ($C_{meta}$), geospatial computation ($C_{geo}$), and I/O access ($C_{io}$). Fig.~\ref{fig:index_exp2}(b) presents the time consumption breakdown for a representative medium-scale retrieval (involving approx. 50 image tiles). As expected, the latency of Baseline 1 is entirely dominated by $C_{io}$, rendering $C_{meta}$ and $C_{geo}$ negligible. The massive data transfer masks all other overheads. While $C_{io}$ of Baseline 2 is successfully reduced to the window size, a new bottleneck emerges in $C_{geo}$. The runtime coordinate transformations and polygon clipping consume nearly $40\%$ of the total execution time ($\approx 350 ms$). This observation confirms our theoretical analysis that window-based I/O shifts the bottleneck from storage to CPU. The proposed method exhibits a balanced profile. Although $C_{meta}$ increases slightly ($\approx 35 ms$) due to the two-phase index lookup (G2I + I2G), this cost is well-amortized. Crucially, $C_{geo}$ is effectively eliminated thanks to the pre-computed grid-window mappings. Consequently, our approach achieves a total latency of 580 ms, providing a $1.7\times$ speedup over Baseline 2 by removing the computational bottleneck without regressing on I/O performance.
@@ -635,7 +571,6 @@ To empirically validate the cost model proposed in Eq.~\ref{eqn:cost_total}, we
 	\label{fig:index_exp3_3}
 \end{figure}
 \par
 To quantify the individual contributions of the G2I (coarse filtering) and I2G (fine-grained access) components, we decomposed the system into four variants. Fig.~\ref{fig:index_exp3} breaks down the performance in terms of I/O volume and latency components. Fig.~\ref{fig:index_exp3}(a) confirms that removing either component leads to suboptimal I/O behavior. The \textit{No Index} and \textit{G2I Only} variants result in 100\% I/O volume, as they lack the window information required for partial access. Conversely, \textit{I2G Only} and \textit{G2I+I2G} achieve minimal I/O volume ($\approx 10\%$).
 Fig.~\ref{fig:index_exp3}(b) reveals the latency breakdown. \textit{No Index} suffers from both high full table scanning cost and high storage I/O cost. \textit{G2I Only} efficiently reduces metadata lookup time ($\approx 50$ ms) but fails to reduce storage I/O ($\approx 8000$ ms). Although \textit{I2G Only} minimizes storage I/O ($\approx 100$ ms), it incurs prohibitive metadata lookup overhead ($\approx 1500$ ms) because the system must scan the entire I2G table to identify relevant images without spatial pruning. \textit{G2I+I2G} achieves the best performance, maintaining low metadata latency ($\approx 60$ ms) via G2I pruning while ensuring minimal storage I/O ($\approx 100$ ms) via I2G windowing.
@@ -661,13 +596,11 @@ Moreover, the choice of grid resolution (Zoom Level) is a critical parameter tha
 	\label{fig:index_exp4}
 \end{figure}
 \par
 Finally, we evaluated the scalability and cost of maintaining the index. Fig.~\ref{fig:index_exp4} compares our method against PostGIS (R-tree) and GeoMesa (Z-order) during the ingestion of $7\times 10^5$ images. Fig.~\ref{fig:index_exp4}(a) illustrates the ingestion throughput. PostGIS exhibits a degrading trend as the dataset grows, bottlenecked by the logarithmic cost of R-tree rebalancing. In contrast, Ours maintains a stable throughput ($\approx 2100$ img/s). Although slightly lower than the lightweight GeoMesa ($\approx 2500$ img/s) due to the dual-table write overhead, our method demonstrates linear scalability suitable for high-velocity streaming data. Regarding storage cost (Fig.~\ref{fig:index_exp4}(b)), our index occupies approximately 0.83\% of the raw data size. While this is higher than GeoMesa (0.15\%) and PostGIS (0.51\%) due to the storage of grid-window mappings, it remains strictly below the 1\% threshold. This result validates that the proposed method achieves significant performance gains with a negligible storage penalty.
 \subsection{Evaluating the Concurrency Control}
 In this section, we evaluate the proposed hybrid coordination mechanism on a distributed storage cluster to assess its scalability, robustness under contention, and internal storage efficiency.
 \par
 To systematically control the workload characteristics, we developed a synthetic workload generator. We define the Spatial Overlap Ratio ($\sigma$) to quantify the extent of shared data regions among concurrent queries, ranging from $\sigma=0$ (disjoint) to $\sigma=0.9$ (highly concentrated hotspots). The number of concurrent clients varies from $N=1$ to $N=64$. It is worth noting that, given the data-intensive nature of retrievals where a single request triggers GB-scale I/O and complex decoding, 64 concurrent streams are sufficient to fully saturate the aggregate I/O bandwidth and CPU resources of our experimental cluster, representing a heavy-load scenario in operational scientific computing environments.
 For comparison, we evaluate the following execution schemes:
@@ -727,7 +660,6 @@ Fig.~\ref{fig:cc_exp3}(a) plots the total physical data read size. The baseline
 Fig.~\ref{fig:cc_exp3}(b) illustrates the number of backend I/O requests (IOPS). The baseline generates distinct I/O requests for every client, reaching 12,800 requests at $N=64$. This massive influx of small random reads overwhelms the storage scheduler, causing excessive disk head seek latency. Our system keeps the request count low and stable. Even at peak load ($N=64$), the number of physical I/O requests is suppressed to 2,000.
 \par
 Fig.~\ref{fig:cc_exp3}(a) and Fig.~\ref{fig:cc_exp3}(b) demonstrate the Request Collapse effect. While 64 concurrent clients generate 12,800 IOPS in the baseline, our system collapses them into fewer than 600 physical operations.
 \subsubsection{Deterministic and Non-Deterministic Modes}\label{sec:ModeSwitch}
@@ -738,7 +670,6 @@ Fig.~\ref{fig:cc_exp3}(a) and Fig.~\ref{fig:cc_exp3}(b) demonstrate the Request
 	\label{fig:cc_exp4}
 \end{figure}
 \par
 To validate the necessity of the proposed hybrid switching mechanism, we compared our adaptive approach against two static execution policies: non-deterministic mode and deterministic mode. We swept the spatial overlap ratio from 0 to 0.9 to capture the performance crossover zone. The workload concurrency was fixed at a moderate level ($N=16$) to highlight the sensitivity to spatial contention.
 Fig.~\ref{fig:cc_exp4} depicts the average latency curves for the three strategies. The results demonstrate that neither static policy is universally optimal, validating the design of our contention-aware switch. In the low-overlap condition, the non-deterministic mode achieves the lower latency. In contrast, the deterministic mode incurs a higher baseline latency. This performance gap is attributed to the inherent coordination overhead. Even when no conflicts exist, the deterministic scheduler must still perform time-window batching and plan sorting. Consequently, enforcing global ordering for disjoint queries introduces unnecessary serialization latency.