修改语法错误

2026-03-03 09:41:33 +08:00
parent 5e87d77c70
commit 2a98a4ef24
7 changed files with 108 additions and 295 deletions
--- a/rs_retrieval.tex
+++ b/rs_retrieval.tex
@@ -126,7 +126,7 @@ Definition 2 (Spatio-temporal Range Retrieval). Given a dataset $\mathbb{R}$, a
 \vspace{-0.05in}
 \begin{equation}
 	\label{eqn:pre_st_query}
-	\mathcal{R}_Q=R\in \mathbb{R}\mid MBR\left( R \right) \cap S\ne \emptyset \land R.t\cap T\ne \emptyset .
+	\mathcal{R}_Q=\{R\in \mathbb{R}\mid MBR\left( R \right) \cap S\ne \emptyset \land R.t\cap T\ne \emptyset \}.
 \end{equation}

 For each $R \in \mathcal{R}_Q$, the system must return the pixel matrix corresponding to the intersection region $MBR(R) \cap S$.
@@ -162,9 +162,9 @@ subject to:
 	\label{fig:overview}
 \end{figure}

-To address the challenges of storage-level I/O contention and expensive runtime computations, we propose a layered distributed retrieval framework. As illustrated in Fig. \ref{fig:overview}, the system architecture is composed of four primary processing components: (1) \emph{requst interface}, (2) \emph{index manager}, (3) \emph{I/O coordinator}, (4) \emph{data loader}, and (5) \emph{adaptive tuner}.
+To address the challenges of storage-level I/O contention and expensive runtime computations, we propose a layered distributed retrieval framework. As illustrated in Fig. \ref{fig:overview}, the system architecture is composed of four primary processing components: (1) \emph{request interface}, (2) \emph{index manager}, (3) \emph{I/O coordinator}, (4) \emph{data loader}, and (5) \emph{adaptive tuner}.

-The $\emph{requst interface}$ serves as the system entry point. It is responsible for accepting concurrent spatio-temporal retrievals. The $\emph{index manager}$ acts as the planner of the system, interacting with the metadata storage. It translates logical spatio-temporal predicates into physical storage locations using a dual-layer inverted index. The $\emph{I/O coordinator}$ serves as the traffic control layer. It detects spatial overlaps among concurrent reading plans to identify potential I/O conflicts and applies the hybrid concurrency-aware protocol to reorder or merge conflicting requests. Finally, the $\emph{data loader}$ interface with the distributed file system or object store to read the pixel data. What's more, \emph{adaptive tuner} optimizes the execution parameters in the background.
+The $\emph{request interface}$ serves as the system entry point. It is responsible for accepting concurrent spatio-temporal retrievals. The $\emph{index manager}$ acts as the planner of the system, interacting with the metadata storage. It translates logical spatio-temporal predicates into physical storage locations using a dual-layer inverted index. The $\emph{I/O coordinator}$ serves as the traffic control layer. It detects spatial overlaps among concurrent reading plans to identify potential I/O conflicts and applies the hybrid concurrency-aware protocol to reorder or merge conflicting requests. Finally, the $\emph{data loader}$ interfaces with the distributed file system or object store to read the pixel data. What's more, \emph{adaptive tuner} optimizes the execution parameters in the background.

 \section{I/O-aware Indexing Structure}\label{sec:Index}
 This section introduces the details of the indexing structure for spatio-temporal range retrieval over RS data.
@@ -285,7 +285,7 @@ When a spatio-temporal range retrieval $Q$ arrives, the system first performs in
 As a result, each retrieval is translated into an explicit \textit{I/O access plan} consisting of image–window pairs:
 \vspace{-0.05in}
 \begin{equation}
-	\label{eq:io_plan}
+	\label{eqn:io_plan}
 	Plan\left( Q \right) =\left\{ \left( img_1,w_1 \right) ,\left( img_1,w_2 \right) ,\left( img_3,w_5 \right) ,... \right\},
 \end{equation}
 where each window $w$ denotes a concrete pixel range to be accessed via byte-range I/O. Upon admission, the system assigns each retrieval a unique \textit{RetrievalID} and records its arrival timestamp.
@@ -297,7 +297,7 @@ Let $A(Plan(Q_i))$ be the aggregate spatial area of all pixel windows in the I/O
 \vspace{-0.05in}
 \begin{equation}
 	\vspace{-0.05in}
-	\label{eqn_tuning_table}
+	\label{eqn_tuning_overlap}
 	\sigma = 1 - \frac{\text{A}(\bigcup_{i=1}^n Plan(Q_i))}{\sum_{i=1}^n \text{A}(Plan(Q_i))},
 \end{equation}
 where $\sigma \in [0, 1]$. A high $\sigma$ indicates that multiple retrievals are competing for the same image regions, leading to high I/O amplification if executed independently.
@@ -358,7 +358,7 @@ For a given tuning configuration $\theta $ and execution context $c$, the observ
 \vspace{-0.05in}
 \begin{equation}
 	\vspace{-0.05in}
-	\label{eqn_tuning_table}
+	\label{eqn_tuning_performance}
 	Y\left( \theta ,c \right) =f\left( \theta ,c \right) +\epsilon ,
 \end{equation}
 where $f\left( \cdot \right) $ is an unknown performance function and $\epsilon$ captures stochastic noise. Moreover, as retrieval workloads evolve over time, the distribution of execution contexts $c$ may change, making the tuning problem non-stationary.
@@ -367,7 +367,7 @@ Given a stream of retrievals $\mathcal{Q}$ and the resulting sequence of executi
 \vspace{-0.05in}
 \begin{equation}
 	\vspace{-0.05in}
-	\label{eqn_tuning_table}
+	\label{eqn_tuning_objective}
 	\min_{\left\{ \theta _t \right\}}\mathbb{E}\left[ \sum_{t=1}^T{Y}\left( \theta _t,c_t \right) \right] ,
 \end{equation}
 subject to practical constraints on tuning overhead and system stability.
@@ -546,40 +546,36 @@ We categorize the baseline systems into two groups based on their data retrieval

 \begin{figure}[tb]
 	\centering
-	\subfigure[Query footprint ratios]{
+	\subfigure[Query footprint ratios\label{fig:index_exp1_1}]{
 		\begin{minipage}[b]{0.227\textwidth}
 			\includegraphics[width=0.95\textwidth]{exp/index_exp1_1.pdf}
 		\end{minipage}
 	}
-	\label{fig:index_exp1_1}
-	\subfigure[Query spatial extents]{
+	\subfigure[Query spatial extents\label{fig:index_exp1_2}]{
 		\begin{minipage}[b]{0.227\textwidth}
 			\includegraphics[width=0.95\textwidth]{exp/index_exp1_2.pdf}
 		\end{minipage}
 	}
-	\label{fig:index_exp1_2}
 	\caption{The efficiency of I/O selectivity}
 	\label{fig:index_exp1}
 \end{figure}

-First, we evaluated the effectiveness of data reduction by measuring the I/O selectivity, defined as the ratio of the retrieved data volume to the total file size. Fig.~\ref{fig:index_exp1} compares our method against baselines. As illustrated in Fig.~\ref{fig:index_exp1}(a), systems such as PostGIS, GeoMesa, and MSTGI, which rely on full file loading, exhibit consistent performance. They always reads the entire image regardless of the proportion of the intersection between the query range and the image. In contrast, OpenDataCube, Rio-tiler, and ours significantly reduce I/O traffic by enabling partial reads. It is worth noting that our method incurs slightly higher I/O volume compared to the theoretically optimal baseline (OpenDataCube and Rio-tiler). This marginal data redundancy is attributed to the grid alignment effect: our index retrieves pixel blocks based on fixed grid boundaries, whereas OpenDataCube and Rio-tiler perform precise geospatial clipping. Fig.~\ref{fig:index_exp1}(b) further presents the distribution of unnecessary data fraction. While our method introduces a small amount of over-reading due to grid padding, it successfully avoids the massive data waste observed in the full-file retrieval systems. As we will demonstrate in the next section, this slight compromise in I/O precision is a strategic trade-off that eliminates expensive runtime computations.
+First, we evaluated the effectiveness of data reduction by measuring the I/O selectivity, defined as the ratio of the retrieved data volume to the total file size. Fig.~\ref{fig:index_exp1} compares our method against baselines. As illustrated in Fig.~\ref{fig:index_exp1}(a), systems such as PostGIS, GeoMesa, and MSTGI, which rely on full file loading, exhibit consistent performance. They always read the entire image regardless of the proportion of the intersection between the query range and the image. In contrast, OpenDataCube, Rio-tiler, and ours significantly reduce I/O traffic by enabling partial reads. It is worth noting that our method incurs slightly higher I/O volume compared to the theoretically optimal baseline (OpenDataCube and Rio-tiler). This marginal data redundancy is attributed to the grid alignment effect: our index retrieves pixel blocks based on fixed grid boundaries, whereas OpenDataCube and Rio-tiler perform precise geospatial clipping. Fig.~\ref{fig:index_exp1}(b) further presents the distribution of unnecessary data fraction. While our method introduces a small amount of over-reading due to grid padding, it successfully avoids the massive data waste observed in the full-file retrieval systems. As we will demonstrate in the next section, this slight compromise in I/O precision is a strategic trade-off that eliminates expensive runtime computations.

 \subsubsection{End-to-End Retrieval Latency}\label{sec:Index_exp_2}

 \begin{figure}[tb]
 	\centering
-	\subfigure[Query footprint ratios]{
+	\subfigure[Query footprint ratios\label{fig:index_exp2_1}]{
 		\begin{minipage}[b]{0.227\textwidth}
 			\includegraphics[width=0.95\textwidth]{exp/index_exp2_1.pdf}
 		\end{minipage}
 	}
-	\label{fig:index_exp2_1}
-	\subfigure[Query footprint ratios]{
+	\subfigure[Query footprint ratios\label{fig:index_exp2_2}]{
 		\begin{minipage}[b]{0.227\textwidth}
 			\includegraphics[width=0.95\textwidth]{exp/index_exp2_2.pdf}
 		\end{minipage}
 	}
-	\label{fig:index_exp2_2}
 	\caption{End-to-End retrieval latency}
 	\label{fig:index_exp2}
 \end{figure}
@@ -600,18 +596,16 @@ To empirically validate the cost model in Eq.~\ref{eqn:cost_total}, we decompose
 \subsubsection{Ablation Study}\label{sec:Index_exp_3}
 \begin{figure}[tb]
 	\centering
-	\subfigure[I/O reduction analysis]{
+	\subfigure[I/O reduction analysis\label{fig:index_exp3_1}]{
 		\begin{minipage}[b]{0.227\textwidth}
 			\includegraphics[width=0.9\textwidth]{exp/index_exp3_1.pdf}
 		\end{minipage}
 	}
-	\label{fig:index_exp3_1}
-	\subfigure[Latency breakdown]{
+	\subfigure[Latency breakdown\label{fig:index_exp3_2}]{
 		\begin{minipage}[b]{0.227\textwidth}
 			\includegraphics[width=0.9\textwidth]{exp/index_exp3_2.pdf}
 		\end{minipage}
 	}
-	\label{fig:index_exp3_2}
 	\caption{Ablation analysis}
 	\label{fig:index_exp3}
 \end{figure}
@@ -632,18 +626,16 @@ Moreover, the choice of grid resolution (Zoom Level) is a critical parameter tha
 \subsubsection{Index Construction and Storage Overhead}
 \begin{figure}[tb]
 	\centering
-	\subfigure[Ingested images ($10^4$)]{
+	\subfigure[Ingested images ($10^4$)\label{fig:index_exp4_2}]{
 		\begin{minipage}[b]{0.227\textwidth}
 			\includegraphics[width=0.98\textwidth]{exp/index_exp4_2.pdf}
 		\end{minipage}
 	}
-	\label{fig:index_exp4_2}
-	\subfigure[Various index types]{
+	\subfigure[Various index types\label{fig:index_exp4_1}]{
 		\begin{minipage}[b]{0.227\textwidth}
 			\includegraphics[width=0.91\textwidth]{exp/index_exp4_1.pdf}
 		\end{minipage}
 	}
-	\label{fig:index_exp4_1}
 	\caption{Index construction and storage overhead}
 	\label{fig:index_exp4}
 \end{figure}
@@ -717,18 +709,16 @@ The results reveal that traditional concurrency control mechanisms fail to addre
 \subsubsection{Storage-Level Effects and Request Collapse}
 \begin{figure}[tb]
 	\centering
-	\subfigure[The number of clients]{
+	\subfigure[The number of clients\label{fig:cc_exp3_1}]{
 		\begin{minipage}[b]{0.227\textwidth}
 			\includegraphics[width=0.91\textwidth]{exp/cc_exp3_1.pdf}
 		\end{minipage}
 	}
-	\label{fig:cc_exp3_1}
-	\subfigure[The number of clients]{
+	\subfigure[The number of clients\label{fig:cc_exp3_2}]{
 		\begin{minipage}[b]{0.227\textwidth}
 			\includegraphics[width=0.95\textwidth]{exp/cc_exp3_2.pdf}
 		\end{minipage}
 	}
-	\label{fig:cc_exp3_2}
 	\caption{The data volume reduction and request collapse}
 	\label{fig:cc_exp3}
 \end{figure}