diff --git a/.claude/settings.local.json b/.claude/settings.local.json index 3c31ae0..2fcb5f2 100644 --- a/.claude/settings.local.json +++ b/.claude/settings.local.json @@ -1,7 +1,7 @@ { "permissions": { "allow": [ - "Bash(python:*)" + "Bash(pdflatex:*)" ] } } diff --git a/CLAUDE.md b/CLAUDE.md deleted file mode 100644 index 06f1408..0000000 --- a/CLAUDE.md +++ /dev/null @@ -1,182 +0,0 @@ -# CLAUDE.md - -This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. - -## Project Overview - -This is a LaTeX research paper titled "An I/O-Efficient Approach for Concurrent Spatio-Temporal Range Retrievals over Large-Scale Remote Sensing Image Data" submitted to an IEEE journal. The paper proposes novel techniques for efficient retrieval of remote sensing imagery, including: - -- **Index-as-an-Execution-Plan paradigm**: Integrates fine-grained partial retrieval directly into indexing structure -- **Dual-layer inverted index (G2I/I2G)**: Pre-materializes grid-to-pixel mappings to eliminate runtime geometric calculations -- **Hybrid concurrency-aware I/O coordination**: Combines Calvin-style deterministic ordering with optimistic execution -- **SA-GMAB (Surrogate-Assisted Genetic Multi-Armed Bandit)**: Auto-tuning mechanism for fluctuating workloads - -## Build and Compilation - -### Primary Build Commands - -```bash -# Standard compilation (recommended for IEEE format) -pdflatex rs_retrieval.tex - -# Alternative compilation (being tested) -xelatex rs_retrieval.tex - -# Full build cycle (includes bibliography) -pdflatex rs_retrieval.tex -bibtex rs_retrieval -pdflatex rs_retrieval.tex -pdflatex rs_retrieval.tex -``` - -### Build System - -- **Compiler**: pdfTeX (MiKTeX distribution on Windows) -- **Document Class**: IEEEtran (IEEE journal format) -- **Output**: 15-page PDF (~2.36MB) -- **No automation**: No Makefile or build scripts - use manual compilation - -## Document Structure - -The paper follows standard IEEE journal organization with these main sections: - -1. **Introduction** - Motivation and problem statement -2. **Related Work** - Spatio-temporal retrieval, concurrency control, I/O tuning -3. **Problem Formulation** - Mathematical definitions and cost models -4. **I/O-aware Indexing Structure** (Section 3) - Core technical contribution - - Grid-to-Image (G2I) index - - Image-to-Grid (I2G) index - - Pre-materialized execution plans -5. **Hybrid Concurrency-Aware I/O Coordination** (Section 4) - - Deterministic vs optimistic execution modes - - Adaptive mode switching -6. **I/O Stack Tuning** (Section 5) - SA-GMAB algorithm -7. **Performance Evaluation** (Section 6) - Experimental results on Martian datasets -8. **Conclusions** - Summary of contributions - -### Key Files - -- `rs_retrieval.tex` - Main LaTeX source (single-file document) -- `references.bib` - Bibliography database -- `fig/` - Figures directory (index.png, st-query.png, cc.png) -- `exp/` - Experimental results (PDF charts) - -## LaTeX Package Dependencies - -### Required Packages - -```latex -\documentclass[lettersize,journal]{IEEEtran} -\usepackage{amsmath,amsfonts} % Mathematics -\usepackage{graphicx} % Figures -\usepackage[linesnumbered,lined,ruled]{algorithm2e} % Algorithms -\usepackage{cite} % Citations -\usepackage{array} % Table formatting -\usepackage{makecell} % Table cells -\usepackage{subfigure} % Subfigures -``` - -### Chinese Language Support - -- The project directory name includes Chinese characters (遥感影像部分检索) -- Document content is in English -- Uses ctex distribution (Chinese TeX) on the system - -## Document Conventions - -### Cross-References - -All sections use `\label{}` and `\ref{}` for cross-referencing: -- Section labels: `sec:XX` format (e.g., `\label{sec:Index}`) -- Algorithm labels: `alg:XX` format -- Figure labels: `fig:XX` format -- Equation labels: `eq:XX` format - -### Mathematical Notation - -- Extensive use of mathematical formulations -- Cost models use notation: $C_{total}$, $T_{compute}$, etc. -- Algorithm pseudo-code uses algorithm2e package - -### Citation Style - -- IEEE citation style with numeric references -- Citations in format: `\cite{AuthorYearKEY}` -- Bibliography managed in `references.bib` - -### Figure Organization - -Figures are organized by topic: -- `fig/index.png` - Index schema design -- `fig/st-query.png` - Retrieval-time execution flow -- `fig/cc.png` - Concurrency coordination mechanism - -## Common Editing Tasks - -### Adding a New Section - -1. Add `\section{Section Name}` with `\label{sec:NAME}` -2. Update the table of contents/organization paragraph in Introduction -3. Ensure cross-references use correct label format - -### Modifying Algorithms - -- Use `algorithm2e` environment -- Keep `linesnumbered,lined,ruled` options for consistency -- Label with `\label{alg:NAME}` for referencing - -### Adding Figures - -1. Place figure files in `fig/` directory -2. Use `\begin{figure}[t]` for top placement (IEEE convention) -3. Include `\caption{}` and `\label{fig:NAME}` -4. Refer using `\ref{fig:NAME}` - -### Bibliography Updates - -1. Add entries to `references.bib` -2. Use BibTeX key format: `AuthorYearKEY` (e.g., `Ma15RS_bigdata`) -3. Run `bibtex rs_retrieval` after modifying .bib file -4. Compile LaTeX twice to resolve references - -## Important Notes - -### Compilation Workflow - -When making changes that affect: -- **Text only**: Single `pdflatex` run sufficient -- **Citations**: Run `pdflatex` → `bibtex` → `pdflatex` × 2 -- **New sections/labels**: Run `pdflatex` twice to resolve cross-references -- **Figures**: Ensure all figure files exist before compilation - -### Git Repository - -- Main branch: `main` -- Recent activity: Testing XeLaTeX compilation -- Modified files tracked: .tex, .pdf, .aux, .log, .synctex.gz - -### Document Formatting - -- Strict IEEE journal format compliance -- Font: Times Roman family -- Two-column layout -- Letter size paper -- 15-page final document - -### Known Issues - -- Some font variants (bold/italic) unavailable in current TeX distribution -- Testing migration from pdflatex to xelatex (commit f7ffed8) - -## Experimental Data Reference - -The paper evaluates on Martian remote sensing datasets: -- **Total volume**: 51.9 TB across 669,641 images -- **Datasets**: MoRIC, CTX, THEMIS, HiRISE -- **Environment**: 9-node cluster with HBase and Lustre file system -- **Metrics**: Latency, I/O throughput, request collapse efficiency - -Results show: -- Order-of-magnitude latency reduction with I/O-aware indexing -- 54x speedup under high contention with hybrid coordination -- 2x faster recovery from workload shifts with SA-GMAB diff --git a/rs_retrieval.aux b/rs_retrieval.aux index 926d5d4..10ec728 100644 --- a/rs_retrieval.aux +++ b/rs_retrieval.aux @@ -69,9 +69,9 @@ \@writefile{toc}{\contentsline {section}{\numberline {VI}Hybrid Concurrency-Aware I/O Coordination}{7}{}\protected@file@percent } \newlabel{sec:CC}{{VI}{7}} \@writefile{toc}{\contentsline {subsection}{\numberline {\mbox {VI-A}}Retrieval Admission and I/O Plan Generation}{7}{}\protected@file@percent } -\newlabel{eq:io_plan}{{8}{7}} +\newlabel{eqn:io_plan}{{8}{7}} \@writefile{toc}{\contentsline {subsection}{\numberline {\mbox {VI-B}}Contention Estimation and Path Selection}{7}{}\protected@file@percent } -\newlabel{eqn_tuning_table}{{9}{7}} +\newlabel{eqn_tuning_overlap}{{9}{7}} \@writefile{toc}{\contentsline {subsection}{\numberline {\mbox {VI-C}}Deterministic Coordinated and Non-deterministic Execution}{7}{}\protected@file@percent } \@writefile{lof}{\contentsline {figure}{\numberline {4}{\ignorespaces Hybrid Concurrency-Aware I/O Coordination.}}{8}{}\protected@file@percent } \newlabel{fig:cc}{{4}{8}} @@ -82,8 +82,8 @@ \@writefile{toc}{\contentsline {section}{\numberline {VII}I/O Stack Tuning}{8}{}\protected@file@percent } \newlabel{sec:Tuning}{{VII}{8}} \@writefile{toc}{\contentsline {subsection}{\numberline {\mbox {VII-A}}Formulation of Online I/O Tuning}{9}{}\protected@file@percent } -\newlabel{eqn_tuning_table}{{10}{9}} -\newlabel{eqn_tuning_table}{{11}{9}} +\newlabel{eqn_tuning_performance}{{10}{9}} +\newlabel{eqn_tuning_objective}{{11}{9}} \@writefile{toc}{\contentsline {subsection}{\numberline {\mbox {VII-B}}Surrogate-Assisted GMAB for Online I/O Tuning}{9}{}\protected@file@percent } \citation{Xie12supercomputer} \@writefile{loa}{\contentsline {algocf}{\numberline {1}{\ignorespaces Surrogate-Assisted Genetic Multi-Armed Bandit (SA-GMAB)}}{10}{}\protected@file@percent } @@ -105,8 +105,10 @@ \newlabel{sec_exp_env}{{\mbox {VIII-A}3}{11}} \@writefile{toc}{\contentsline {subsubsection}{\numberline {\mbox {VIII-A}3}Experimental Environment}{11}{}\protected@file@percent } \@writefile{toc}{\contentsline {subsection}{\numberline {\mbox {VIII-B}}Evaluating the Data Indexing Structure}{11}{}\protected@file@percent } -\newlabel{fig:index_exp1_1}{{\mbox {VIII-B}1}{11}} -\newlabel{fig:index_exp1_2}{{\mbox {VIII-B}1}{11}} +\newlabel{fig:index_exp1_1}{{6(a)}{11}} +\newlabel{sub@fig:index_exp1_1}{{(a)}{11}} +\newlabel{fig:index_exp1_2}{{6(b)}{11}} +\newlabel{sub@fig:index_exp1_2}{{(b)}{11}} \@writefile{lof}{\contentsline {figure}{\numberline {6}{\ignorespaces The efficiency of I/O selectivity}}{11}{}\protected@file@percent } \@writefile{lof}{\contentsline {subfigure}{\numberline{(a)}{\ignorespaces {Query footprint ratios}}}{11}{}\protected@file@percent } \@writefile{lof}{\contentsline {subfigure}{\numberline{(b)}{\ignorespaces {Query spatial extents}}}{11}{}\protected@file@percent } @@ -115,33 +117,39 @@ \@writefile{toc}{\contentsline {subsubsection}{\numberline {\mbox {VIII-B}1}I/O Selectivity Analysis}{11}{}\protected@file@percent } \newlabel{sec:Index_exp_2}{{\mbox {VIII-B}2}{11}} \@writefile{toc}{\contentsline {subsubsection}{\numberline {\mbox {VIII-B}2}End-to-End Retrieval Latency}{11}{}\protected@file@percent } -\newlabel{fig:index_exp2_1}{{\mbox {VIII-B}2}{12}} -\newlabel{fig:index_exp2_2}{{\mbox {VIII-B}2}{12}} +\newlabel{fig:index_exp2_1}{{7(a)}{12}} +\newlabel{sub@fig:index_exp2_1}{{(a)}{12}} +\newlabel{fig:index_exp2_2}{{7(b)}{12}} +\newlabel{sub@fig:index_exp2_2}{{(b)}{12}} \@writefile{lof}{\contentsline {figure}{\numberline {7}{\ignorespaces End-to-End retrieval latency}}{12}{}\protected@file@percent } \@writefile{lof}{\contentsline {subfigure}{\numberline{(a)}{\ignorespaces {Query footprint ratios}}}{12}{}\protected@file@percent } \@writefile{lof}{\contentsline {subfigure}{\numberline{(b)}{\ignorespaces {Query footprint ratios}}}{12}{}\protected@file@percent } \newlabel{fig:index_exp2}{{7}{12}} \@writefile{lof}{\contentsline {figure}{\numberline {8}{\ignorespaces Latency breakdown}}{12}{}\protected@file@percent } \newlabel{fig:index_exp2_3}{{8}{12}} -\newlabel{fig:index_exp3_1}{{\mbox {VIII-B}3}{12}} -\newlabel{fig:index_exp3_2}{{\mbox {VIII-B}3}{12}} +\newlabel{fig:index_exp3_1}{{9(a)}{12}} +\newlabel{sub@fig:index_exp3_1}{{(a)}{12}} +\newlabel{fig:index_exp3_2}{{9(b)}{12}} +\newlabel{sub@fig:index_exp3_2}{{(b)}{12}} \@writefile{lof}{\contentsline {figure}{\numberline {9}{\ignorespaces Ablation analysis}}{12}{}\protected@file@percent } \@writefile{lof}{\contentsline {subfigure}{\numberline{(a)}{\ignorespaces {I/O reduction analysis}}}{12}{}\protected@file@percent } \@writefile{lof}{\contentsline {subfigure}{\numberline{(b)}{\ignorespaces {Latency breakdown}}}{12}{}\protected@file@percent } \newlabel{fig:index_exp3}{{9}{12}} \newlabel{sec:Index_exp_3}{{\mbox {VIII-B}3}{12}} \@writefile{toc}{\contentsline {subsubsection}{\numberline {\mbox {VIII-B}3}Ablation Study}{12}{}\protected@file@percent } +\citation{Thomson12Calvin} \@writefile{lof}{\contentsline {figure}{\numberline {10}{\ignorespaces Impact of grid resolution on query latency}}{13}{}\protected@file@percent } \newlabel{fig:index_exp3_3}{{10}{13}} -\newlabel{fig:index_exp4_2}{{\mbox {VIII-B}4}{13}} -\newlabel{fig:index_exp4_1}{{\mbox {VIII-B}4}{13}} +\@writefile{toc}{\contentsline {subsubsection}{\numberline {\mbox {VIII-B}4}Index Construction and Storage Overhead}{13}{}\protected@file@percent } +\newlabel{fig:index_exp4_2}{{11(a)}{13}} +\newlabel{sub@fig:index_exp4_2}{{(a)}{13}} +\newlabel{fig:index_exp4_1}{{11(b)}{13}} +\newlabel{sub@fig:index_exp4_1}{{(b)}{13}} \@writefile{lof}{\contentsline {figure}{\numberline {11}{\ignorespaces Index construction and storage overhead}}{13}{}\protected@file@percent } \@writefile{lof}{\contentsline {subfigure}{\numberline{(a)}{\ignorespaces {Ingested images ($10^4$)}}}{13}{}\protected@file@percent } \@writefile{lof}{\contentsline {subfigure}{\numberline{(b)}{\ignorespaces {Various index types}}}{13}{}\protected@file@percent } \newlabel{fig:index_exp4}{{11}{13}} -\@writefile{toc}{\contentsline {subsubsection}{\numberline {\mbox {VIII-B}4}Index Construction and Storage Overhead}{13}{}\protected@file@percent } \@writefile{toc}{\contentsline {subsection}{\numberline {\mbox {VIII-C}}Evaluating the Concurrency Control}{13}{}\protected@file@percent } -\citation{Thomson12Calvin} \citation{Wu25OOCC} \citation{Habibi25Brook2PL} \@writefile{toc}{\contentsline {subsubsection}{\numberline {\mbox {VIII-C}1}Concurrency Scalability}{14}{}\protected@file@percent } @@ -168,27 +176,30 @@ \@writefile{lof}{\contentsline {subfigure}{\numberline{(b)}{\ignorespaces {$\sigma =0.6$}}}{15}{}\protected@file@percent } \@writefile{lof}{\contentsline {subfigure}{\numberline{(c)}{\ignorespaces {$\sigma =0.8$}}}{15}{}\protected@file@percent } \newlabel{fig:cc_exp1p95}{{13}{15}} -\newlabel{fig:cc_exp3_1}{{\mbox {VIII-C}2}{15}} -\newlabel{fig:cc_exp3_2}{{\mbox {VIII-C}2}{15}} +\newlabel{fig:cc_exp3_1}{{14(a)}{15}} +\newlabel{sub@fig:cc_exp3_1}{{(a)}{15}} +\newlabel{fig:cc_exp3_2}{{14(b)}{15}} +\newlabel{sub@fig:cc_exp3_2}{{(b)}{15}} \@writefile{lof}{\contentsline {figure}{\numberline {14}{\ignorespaces The data volume reduction and request collapse}}{15}{}\protected@file@percent } \@writefile{lof}{\contentsline {subfigure}{\numberline{(a)}{\ignorespaces {The number of clients}}}{15}{}\protected@file@percent } \@writefile{lof}{\contentsline {subfigure}{\numberline{(b)}{\ignorespaces {The number of clients}}}{15}{}\protected@file@percent } \newlabel{fig:cc_exp3}{{14}{15}} -\newlabel{sec:ModeSwitch}{{\mbox {VIII-C}3}{15}} -\@writefile{toc}{\contentsline {subsubsection}{\numberline {\mbox {VIII-C}3}Deterministic and Non-Deterministic Modes}{15}{}\protected@file@percent } \@writefile{lof}{\contentsline {figure}{\numberline {15}{\ignorespaces Mode Switching}}{15}{}\protected@file@percent } \newlabel{fig:cc_exp4}{{15}{15}} +\newlabel{sec:ModeSwitch}{{\mbox {VIII-C}3}{15}} +\@writefile{toc}{\contentsline {subsubsection}{\numberline {\mbox {VIII-C}3}Deterministic and Non-Deterministic Modes}{15}{}\protected@file@percent } \citation{Behzad13HDF5} \citation{Robert20SA} \citation{Agarwal19TPE} \citation{Bagbaba20RF} \citation{Rajesh24TunIO} -\@writefile{toc}{\contentsline {subsection}{\numberline {\mbox {VIII-D}}Evaluating the I/O Tuning}{16}{}\protected@file@percent } -\@writefile{toc}{\contentsline {subsubsection}{\numberline {\mbox {VIII-D}1}Convergence Speed and Tuning Cost}{16}{}\protected@file@percent } \@writefile{lof}{\contentsline {figure}{\numberline {16}{\ignorespaces Efficiency analysis of the tuning framework.}}{16}{}\protected@file@percent } \newlabel{fig:tune_exp1}{{16}{16}} +\@writefile{toc}{\contentsline {subsection}{\numberline {\mbox {VIII-D}}Evaluating the I/O Tuning}{16}{}\protected@file@percent } +\@writefile{toc}{\contentsline {subsubsection}{\numberline {\mbox {VIII-D}1}Convergence Speed and Tuning Cost}{16}{}\protected@file@percent } \@writefile{lof}{\contentsline {figure}{\numberline {17}{\ignorespaces Latency evolution under workload shift}}{16}{}\protected@file@percent } \newlabel{fig:tune_exp3}{{17}{16}} +\@writefile{toc}{\contentsline {subsubsection}{\numberline {\mbox {VIII-D}2}Adaptation to Workload Shifts}{16}{}\protected@file@percent } \bibstyle{IEEEtran} \bibdata{IEEEabrv,references} \bibcite{Ma15RS_bigdata}{1} @@ -205,12 +216,11 @@ \bibcite{riotiler25riotiler}{12} \bibcite{Thomson12Calvin}{13} \bibcite{Lim17OCC}{14} -\@writefile{toc}{\contentsline {subsubsection}{\numberline {\mbox {VIII-D}2}Adaptation to Workload Shifts}{17}{}\protected@file@percent } +\bibcite{Rajesh24TunIO}{15} +\bibcite{Preil25GMAB}{16} \@writefile{toc}{\contentsline {section}{\numberline {IX}Conclusions}{17}{}\protected@file@percent } \newlabel{sec:Con}{{IX}{17}} \@writefile{toc}{\contentsline {section}{References}{17}{}\protected@file@percent } -\bibcite{Rajesh24TunIO}{15} -\bibcite{Preil25GMAB}{16} \bibcite{Tang12Quad-Tree}{17} \bibcite{Yang24GridMesa}{18} \bibcite{hong2025deterministic}{19} diff --git a/rs_retrieval.log b/rs_retrieval.log index cd64978..6e4821a 100644 --- a/rs_retrieval.log +++ b/rs_retrieval.log @@ -1,4 +1,4 @@ -This is pdfTeX, Version 3.141592653-2.6-1.40.25 (MiKTeX 23.4) (preloaded format=pdflatex 2025.10.23) 28 FEB 2026 15:30 +This is pdfTeX, Version 3.141592653-2.6-1.40.25 (MiKTeX 23.4) (preloaded format=pdflatex 2025.10.23) 3 MAR 2026 09:39 entering extended mode restricted \write18 enabled. %&-line parsing enabled. @@ -340,14 +340,7 @@ File: l3backend-pdftex.def 2023-03-30 L3 backend support: PDF output (pdfTeX) LaTeX Warning: Unused global option(s): [lettersize]. -(rs_retrieval.aux - -LaTeX Warning: Label `eqn_tuning_table' multiply defined. - - -LaTeX Warning: Label `eqn_tuning_table' multiply defined. - -) +(rs_retrieval.aux) \openout1 = `rs_retrieval.aux'. LaTeX Font Info: Checking defaults for OML/cmm/m/it on input line 29. @@ -536,22 +529,22 @@ Package pdftex.def Info: exp/index_exp1_1.pdf used on input line 553. File: exp/index_exp1_2.pdf Graphic file (type pdf) -Package pdftex.def Info: exp/index_exp1_2.pdf used on input line 559. +Package pdftex.def Info: exp/index_exp1_2.pdf used on input line 558. (pdftex.def) Requested size: 111.27748pt x 96.32248pt. File: exp/index_exp2_1.pdf Graphic file (type pdf) -Package pdftex.def Info: exp/index_exp2_1.pdf used on input line 575. +Package pdftex.def Info: exp/index_exp2_1.pdf used on input line 573. (pdftex.def) Requested size: 111.27748pt x 95.44283pt. File: exp/index_exp2_2.pdf Graphic file (type pdf) -Package pdftex.def Info: exp/index_exp2_2.pdf used on input line 581. +Package pdftex.def Info: exp/index_exp2_2.pdf used on input line 578. (pdftex.def) Requested size: 111.27748pt x 95.44283pt. File: exp/index_exp2_3.pdf Graphic file (type pdf) -Package pdftex.def Info: exp/index_exp2_3.pdf used on input line 589. +Package pdftex.def Info: exp/index_exp2_3.pdf used on input line 585. (pdftex.def) Requested size: 130.08621pt x 99.23824pt. [11 <./exp/index_exp1_1.pdf> <./exp/index_exp1_2.pdf @@ -561,17 +554,17 @@ iple pdfs with page group included in a single page File: exp/index_exp3_1.pdf Graphic file (type pdf) -Package pdftex.def Info: exp/index_exp3_1.pdf used on input line 607. +Package pdftex.def Info: exp/index_exp3_1.pdf used on input line 603. (pdftex.def) Requested size: 105.4204pt x 81.6675pt. File: exp/index_exp3_2.pdf Graphic file (type pdf) -Package pdftex.def Info: exp/index_exp3_2.pdf used on input line 613. +Package pdftex.def Info: exp/index_exp3_2.pdf used on input line 608. (pdftex.def) Requested size: 105.4204pt x 82.08418pt. File: exp/index_exp3_3.pdf Graphic file (type pdf) -Package pdftex.def Info: exp/index_exp3_3.pdf used on input line 621. +Package pdftex.def Info: exp/index_exp3_3.pdf used on input line 615. (pdftex.def) Requested size: 130.08621pt x 105.92268pt. [12 <./exp/index_exp2_1.pdf> <./exp/index_exp2_2.pdf @@ -593,12 +586,12 @@ iple pdfs with page group included in a single page File: exp/index_exp4_2.pdf Graphic file (type pdf) -Package pdftex.def Info: exp/index_exp4_2.pdf used on input line 639. +Package pdftex.def Info: exp/index_exp4_2.pdf used on input line 633. (pdftex.def) Requested size: 114.79138pt x 88.47545pt. File: exp/index_exp4_1.pdf Graphic file (type pdf) -Package pdftex.def Info: exp/index_exp4_1.pdf used on input line 645. +Package pdftex.def Info: exp/index_exp4_1.pdf used on input line 638. (pdftex.def) Requested size: 106.5929pt x 85.10497pt. [13 <./exp/index_exp3_3.pdf> <./exp/index_exp4_2.pdf @@ -612,49 +605,55 @@ iple pdfs with page group included in a single page File: exp/cc_exp1_3_mean.pdf Graphic file (type pdf) -Package pdftex.def Info: exp/cc_exp1_3_mean.pdf used on input line 675. +Package pdftex.def Info: exp/cc_exp1_3_mean.pdf used on input line 667. (pdftex.def) Requested size: 151.76744pt x 133.17691pt. File: exp/cc_exp1_2_mean.pdf Graphic file (type pdf) -Package pdftex.def Info: exp/cc_exp1_2_mean.pdf used on input line 677. +Package pdftex.def Info: exp/cc_exp1_2_mean.pdf used on input line 669. (pdftex.def) Requested size: 151.76744pt x 133.17691pt. File: exp/cc_exp1_1_mean.pdf Graphic file (type pdf) -Package pdftex.def Info: exp/cc_exp1_1_mean.pdf used on input line 681. +Package pdftex.def Info: exp/cc_exp1_1_mean.pdf used on input line 673. (pdftex.def) Requested size: 151.76744pt x 133.17691pt. File: exp/cc_exp1_3_p95.pdf Graphic file (type pdf) -Package pdftex.def Info: exp/cc_exp1_3_p95.pdf used on input line 691. +Package pdftex.def Info: exp/cc_exp1_3_p95.pdf used on input line 683. (pdftex.def) Requested size: 151.76744pt x 133.17691pt. File: exp/cc_exp1_2_p95.pdf Graphic file (type pdf) -Package pdftex.def Info: exp/cc_exp1_2_p95.pdf used on input line 693. +Package pdftex.def Info: exp/cc_exp1_2_p95.pdf used on input line 685. (pdftex.def) Requested size: 151.76744pt x 133.17691pt. File: exp/cc_exp1_1_p95.pdf Graphic file (type pdf) -Package pdftex.def Info: exp/cc_exp1_1_p95.pdf used on input line 697. +Package pdftex.def Info: exp/cc_exp1_1_p95.pdf used on input line 689. (pdftex.def) Requested size: 151.76744pt x 133.17691pt. File: exp/cc_exp3_1.pdf Graphic file (type pdf) -Package pdftex.def Info: exp/cc_exp3_1.pdf used on input line 724. +Package pdftex.def Info: exp/cc_exp3_1.pdf used on input line 716. (pdftex.def) Requested size: 106.5929pt x 92.68857pt. File: exp/cc_exp3_2.pdf Graphic file (type pdf) -Package pdftex.def Info: exp/cc_exp3_2.pdf used on input line 730. +Package pdftex.def Info: exp/cc_exp3_2.pdf used on input line 721. (pdftex.def) Requested size: 111.27748pt x 92.36403pt. [14] File: exp/cc_exp4.pdf Graphic file (type pdf) -Package pdftex.def Info: exp/cc_exp4.pdf used on input line 747. +Package pdftex.def Info: exp/cc_exp4.pdf used on input line 737. (pdftex.def) Requested size: 130.08621pt x 108.5792pt. + +Underfull \vbox (badness 2644) has occurred while \output is active [] + + +Underfull \vbox (badness 4556) has occurred while \output is active [] + [15 <./exp/cc_exp1_3_mean.pdf> <./exp/cc_exp1_2_mean.pdf pdfTeX warning: pdflatex.exe (file ./exp/cc_exp1_2_mean.pdf): PDF inclusion: mu @@ -688,7 +687,7 @@ e pdfs with page group included in a single page pdfTeX warning: pdflatex.exe (file ./exp/cc_exp4.pdf): PDF inclusion: multiple pdfs with page group included in a single page >] -Underfull \hbox (badness 2469) in paragraph at lines 770--771 +Underfull \hbox (badness 2469) in paragraph at lines 760--761 []\OT1/ptm/b/n/10 TPE \OT1/ptm/m/n/10 [35]: A model-based se-quen-tial op-ti-mi za-tion [] @@ -696,65 +695,61 @@ za-tion File: exp/tune_exp1_1.pdf Graphic file (type pdf) -Package pdftex.def Info: exp/tune_exp1_1.pdf used on input line 782. +Package pdftex.def Info: exp/tune_exp1_1.pdf used on input line 772. (pdftex.def) Requested size: 130.08621pt x 116.20644pt. File: exp/tune_exp2_1.pdf Graphic file (type pdf) -Package pdftex.def Info: exp/tune_exp2_1.pdf used on input line 797. +Package pdftex.def Info: exp/tune_exp2_1.pdf used on input line 787. (pdftex.def) Requested size: 130.08621pt x 117.2348pt. [16 <./exp/tune_exp1_1.pdf> <./exp/tune_exp2_1.pdf pdfTeX warning: pdflatex.exe (file ./exp/tune_exp2_1.pdf): PDF inclusion: multi ple pdfs with page group included in a single page >] -Underfull \hbox (badness 2495) in paragraph at lines 820--821 +Underfull \hbox (badness 2495) in paragraph at lines 810--811 []\OT1/ptm/m/n/10 This work is sup-ported by the Na-tional Key R&D [] -Underfull \hbox (badness 2799) in paragraph at lines 820--821 +Underfull \hbox (badness 2799) in paragraph at lines 810--811 \OT1/ptm/m/n/10 Pro-gram of China ``In-ter-gov-ern-men-tal In-ter-na-tional Sci - [] -Underfull \hbox (badness 7576) in paragraph at lines 820--821 +Underfull \hbox (badness 7576) in paragraph at lines 810--811 \OT1/ptm/m/n/10 ence and Tech-nol-ogy In-no-va-tion Co-op-er-a-tion" (Grant [] -(rs_retrieval.bbl [17]) [18] (rs_retrieval.aux) - -LaTeX Warning: There were multiply-defined labels. - - ) +(rs_retrieval.bbl [17]) [18] (rs_retrieval.aux) ) Here is how much of TeX's memory you used: - 5798 strings out of 476331 - 99238 string characters out of 5797649 + 5810 strings out of 476331 + 99501 string characters out of 5797649 1882660 words of memory out of 5000000 - 26099 multiletter control sequences out of 15000+600000 + 26111 multiletter control sequences out of 15000+600000 561830 words of font info for 131 fonts, out of 8000000 for 9000 1145 hyphenation exceptions out of 8191 - 62i,17n,67p,2484b,497s stack positions out of 10000i,1000n,20000p,200000b,200000s - -Output written on rs_retrieval.pdf (18 pages, 2549070 bytes). + 62i,17n,67p,2516b,497s stack positions out of 10000i,1000n,20000p,200000b,200000s +< +D:/software/ctex/MiKTeX/fonts/type1/public/amsfonts/cm/cmmi8.pfb> +Output written on rs_retrieval.pdf (18 pages, 2549056 bytes). PDF statistics: 486 PDF objects out of 1000 (max. 8388607) 0 named destinations out of 1000 (max. 500000) diff --git a/rs_retrieval.pdf b/rs_retrieval.pdf index e76f303..854f36f 100644 Binary files a/rs_retrieval.pdf and b/rs_retrieval.pdf differ diff --git a/rs_retrieval.synctex.gz b/rs_retrieval.synctex.gz index 44751c3..1626010 100644 Binary files a/rs_retrieval.synctex.gz and b/rs_retrieval.synctex.gz differ diff --git a/rs_retrieval.tex b/rs_retrieval.tex index 586c5e0..c003a81 100644 --- a/rs_retrieval.tex +++ b/rs_retrieval.tex @@ -126,7 +126,7 @@ Definition 2 (Spatio-temporal Range Retrieval). Given a dataset $\mathbb{R}$, a \vspace{-0.05in} \begin{equation} \label{eqn:pre_st_query} - \mathcal{R}_Q=R\in \mathbb{R}\mid MBR\left( R \right) \cap S\ne \emptyset \land R.t\cap T\ne \emptyset . + \mathcal{R}_Q=\{R\in \mathbb{R}\mid MBR\left( R \right) \cap S\ne \emptyset \land R.t\cap T\ne \emptyset \}. \end{equation} For each $R \in \mathcal{R}_Q$, the system must return the pixel matrix corresponding to the intersection region $MBR(R) \cap S$. @@ -162,9 +162,9 @@ subject to: \label{fig:overview} \end{figure} -To address the challenges of storage-level I/O contention and expensive runtime computations, we propose a layered distributed retrieval framework. As illustrated in Fig. \ref{fig:overview}, the system architecture is composed of four primary processing components: (1) \emph{requst interface}, (2) \emph{index manager}, (3) \emph{I/O coordinator}, (4) \emph{data loader}, and (5) \emph{adaptive tuner}. +To address the challenges of storage-level I/O contention and expensive runtime computations, we propose a layered distributed retrieval framework. As illustrated in Fig. \ref{fig:overview}, the system architecture is composed of four primary processing components: (1) \emph{request interface}, (2) \emph{index manager}, (3) \emph{I/O coordinator}, (4) \emph{data loader}, and (5) \emph{adaptive tuner}. -The $\emph{requst interface}$ serves as the system entry point. It is responsible for accepting concurrent spatio-temporal retrievals. The $\emph{index manager}$ acts as the planner of the system, interacting with the metadata storage. It translates logical spatio-temporal predicates into physical storage locations using a dual-layer inverted index. The $\emph{I/O coordinator}$ serves as the traffic control layer. It detects spatial overlaps among concurrent reading plans to identify potential I/O conflicts and applies the hybrid concurrency-aware protocol to reorder or merge conflicting requests. Finally, the $\emph{data loader}$ interface with the distributed file system or object store to read the pixel data. What's more, \emph{adaptive tuner} optimizes the execution parameters in the background. +The $\emph{request interface}$ serves as the system entry point. It is responsible for accepting concurrent spatio-temporal retrievals. The $\emph{index manager}$ acts as the planner of the system, interacting with the metadata storage. It translates logical spatio-temporal predicates into physical storage locations using a dual-layer inverted index. The $\emph{I/O coordinator}$ serves as the traffic control layer. It detects spatial overlaps among concurrent reading plans to identify potential I/O conflicts and applies the hybrid concurrency-aware protocol to reorder or merge conflicting requests. Finally, the $\emph{data loader}$ interfaces with the distributed file system or object store to read the pixel data. What's more, \emph{adaptive tuner} optimizes the execution parameters in the background. \section{I/O-aware Indexing Structure}\label{sec:Index} This section introduces the details of the indexing structure for spatio-temporal range retrieval over RS data. @@ -285,7 +285,7 @@ When a spatio-temporal range retrieval $Q$ arrives, the system first performs in As a result, each retrieval is translated into an explicit \textit{I/O access plan} consisting of image–window pairs: \vspace{-0.05in} \begin{equation} - \label{eq:io_plan} + \label{eqn:io_plan} Plan\left( Q \right) =\left\{ \left( img_1,w_1 \right) ,\left( img_1,w_2 \right) ,\left( img_3,w_5 \right) ,... \right\}, \end{equation} where each window $w$ denotes a concrete pixel range to be accessed via byte-range I/O. Upon admission, the system assigns each retrieval a unique \textit{RetrievalID} and records its arrival timestamp. @@ -297,7 +297,7 @@ Let $A(Plan(Q_i))$ be the aggregate spatial area of all pixel windows in the I/O \vspace{-0.05in} \begin{equation} \vspace{-0.05in} - \label{eqn_tuning_table} + \label{eqn_tuning_overlap} \sigma = 1 - \frac{\text{A}(\bigcup_{i=1}^n Plan(Q_i))}{\sum_{i=1}^n \text{A}(Plan(Q_i))}, \end{equation} where $\sigma \in [0, 1]$. A high $\sigma$ indicates that multiple retrievals are competing for the same image regions, leading to high I/O amplification if executed independently. @@ -358,7 +358,7 @@ For a given tuning configuration $\theta $ and execution context $c$, the observ \vspace{-0.05in} \begin{equation} \vspace{-0.05in} - \label{eqn_tuning_table} + \label{eqn_tuning_performance} Y\left( \theta ,c \right) =f\left( \theta ,c \right) +\epsilon , \end{equation} where $f\left( \cdot \right) $ is an unknown performance function and $\epsilon$ captures stochastic noise. Moreover, as retrieval workloads evolve over time, the distribution of execution contexts $c$ may change, making the tuning problem non-stationary. @@ -367,7 +367,7 @@ Given a stream of retrievals $\mathcal{Q}$ and the resulting sequence of executi \vspace{-0.05in} \begin{equation} \vspace{-0.05in} - \label{eqn_tuning_table} + \label{eqn_tuning_objective} \min_{\left\{ \theta _t \right\}}\mathbb{E}\left[ \sum_{t=1}^T{Y}\left( \theta _t,c_t \right) \right] , \end{equation} subject to practical constraints on tuning overhead and system stability. @@ -546,40 +546,36 @@ We categorize the baseline systems into two groups based on their data retrieval \begin{figure}[tb] \centering - \subfigure[Query footprint ratios]{ + \subfigure[Query footprint ratios\label{fig:index_exp1_1}]{ \begin{minipage}[b]{0.227\textwidth} \includegraphics[width=0.95\textwidth]{exp/index_exp1_1.pdf} \end{minipage} } - \label{fig:index_exp1_1} - \subfigure[Query spatial extents]{ + \subfigure[Query spatial extents\label{fig:index_exp1_2}]{ \begin{minipage}[b]{0.227\textwidth} \includegraphics[width=0.95\textwidth]{exp/index_exp1_2.pdf} \end{minipage} } - \label{fig:index_exp1_2} \caption{The efficiency of I/O selectivity} \label{fig:index_exp1} \end{figure} -First, we evaluated the effectiveness of data reduction by measuring the I/O selectivity, defined as the ratio of the retrieved data volume to the total file size. Fig.~\ref{fig:index_exp1} compares our method against baselines. As illustrated in Fig.~\ref{fig:index_exp1}(a), systems such as PostGIS, GeoMesa, and MSTGI, which rely on full file loading, exhibit consistent performance. They always reads the entire image regardless of the proportion of the intersection between the query range and the image. In contrast, OpenDataCube, Rio-tiler, and ours significantly reduce I/O traffic by enabling partial reads. It is worth noting that our method incurs slightly higher I/O volume compared to the theoretically optimal baseline (OpenDataCube and Rio-tiler). This marginal data redundancy is attributed to the grid alignment effect: our index retrieves pixel blocks based on fixed grid boundaries, whereas OpenDataCube and Rio-tiler perform precise geospatial clipping. Fig.~\ref{fig:index_exp1}(b) further presents the distribution of unnecessary data fraction. While our method introduces a small amount of over-reading due to grid padding, it successfully avoids the massive data waste observed in the full-file retrieval systems. As we will demonstrate in the next section, this slight compromise in I/O precision is a strategic trade-off that eliminates expensive runtime computations. +First, we evaluated the effectiveness of data reduction by measuring the I/O selectivity, defined as the ratio of the retrieved data volume to the total file size. Fig.~\ref{fig:index_exp1} compares our method against baselines. As illustrated in Fig.~\ref{fig:index_exp1}(a), systems such as PostGIS, GeoMesa, and MSTGI, which rely on full file loading, exhibit consistent performance. They always read the entire image regardless of the proportion of the intersection between the query range and the image. In contrast, OpenDataCube, Rio-tiler, and ours significantly reduce I/O traffic by enabling partial reads. It is worth noting that our method incurs slightly higher I/O volume compared to the theoretically optimal baseline (OpenDataCube and Rio-tiler). This marginal data redundancy is attributed to the grid alignment effect: our index retrieves pixel blocks based on fixed grid boundaries, whereas OpenDataCube and Rio-tiler perform precise geospatial clipping. Fig.~\ref{fig:index_exp1}(b) further presents the distribution of unnecessary data fraction. While our method introduces a small amount of over-reading due to grid padding, it successfully avoids the massive data waste observed in the full-file retrieval systems. As we will demonstrate in the next section, this slight compromise in I/O precision is a strategic trade-off that eliminates expensive runtime computations. \subsubsection{End-to-End Retrieval Latency}\label{sec:Index_exp_2} \begin{figure}[tb] \centering - \subfigure[Query footprint ratios]{ + \subfigure[Query footprint ratios\label{fig:index_exp2_1}]{ \begin{minipage}[b]{0.227\textwidth} \includegraphics[width=0.95\textwidth]{exp/index_exp2_1.pdf} \end{minipage} } - \label{fig:index_exp2_1} - \subfigure[Query footprint ratios]{ + \subfigure[Query footprint ratios\label{fig:index_exp2_2}]{ \begin{minipage}[b]{0.227\textwidth} \includegraphics[width=0.95\textwidth]{exp/index_exp2_2.pdf} \end{minipage} } - \label{fig:index_exp2_2} \caption{End-to-End retrieval latency} \label{fig:index_exp2} \end{figure} @@ -600,18 +596,16 @@ To empirically validate the cost model in Eq.~\ref{eqn:cost_total}, we decompose \subsubsection{Ablation Study}\label{sec:Index_exp_3} \begin{figure}[tb] \centering - \subfigure[I/O reduction analysis]{ + \subfigure[I/O reduction analysis\label{fig:index_exp3_1}]{ \begin{minipage}[b]{0.227\textwidth} \includegraphics[width=0.9\textwidth]{exp/index_exp3_1.pdf} \end{minipage} } - \label{fig:index_exp3_1} - \subfigure[Latency breakdown]{ + \subfigure[Latency breakdown\label{fig:index_exp3_2}]{ \begin{minipage}[b]{0.227\textwidth} \includegraphics[width=0.9\textwidth]{exp/index_exp3_2.pdf} \end{minipage} } - \label{fig:index_exp3_2} \caption{Ablation analysis} \label{fig:index_exp3} \end{figure} @@ -632,18 +626,16 @@ Moreover, the choice of grid resolution (Zoom Level) is a critical parameter tha \subsubsection{Index Construction and Storage Overhead} \begin{figure}[tb] \centering - \subfigure[Ingested images ($10^4$)]{ + \subfigure[Ingested images ($10^4$)\label{fig:index_exp4_2}]{ \begin{minipage}[b]{0.227\textwidth} \includegraphics[width=0.98\textwidth]{exp/index_exp4_2.pdf} \end{minipage} } - \label{fig:index_exp4_2} - \subfigure[Various index types]{ + \subfigure[Various index types\label{fig:index_exp4_1}]{ \begin{minipage}[b]{0.227\textwidth} \includegraphics[width=0.91\textwidth]{exp/index_exp4_1.pdf} \end{minipage} } - \label{fig:index_exp4_1} \caption{Index construction and storage overhead} \label{fig:index_exp4} \end{figure} @@ -717,18 +709,16 @@ The results reveal that traditional concurrency control mechanisms fail to addre \subsubsection{Storage-Level Effects and Request Collapse} \begin{figure}[tb] \centering - \subfigure[The number of clients]{ + \subfigure[The number of clients\label{fig:cc_exp3_1}]{ \begin{minipage}[b]{0.227\textwidth} \includegraphics[width=0.91\textwidth]{exp/cc_exp3_1.pdf} \end{minipage} } - \label{fig:cc_exp3_1} - \subfigure[The number of clients]{ + \subfigure[The number of clients\label{fig:cc_exp3_2}]{ \begin{minipage}[b]{0.227\textwidth} \includegraphics[width=0.95\textwidth]{exp/cc_exp3_2.pdf} \end{minipage} } - \label{fig:cc_exp3_2} \caption{The data volume reduction and request collapse} \label{fig:cc_exp3} \end{figure}