小改

2026-02-10 10:34:58 +08:00
parent fc3177732b
commit 1bd2f32b09
4 changed files with 3 additions and 3 deletions
--- a/rs_retrieval.log
+++ b/rs_retrieval.log
@@ -1,4 +1,4 @@
-This is pdfTeX, Version 3.141592653-2.6-1.40.25 (MiKTeX 23.4) (preloaded format=pdflatex 2025.10.23)  5 FEB 2026 17:12
+This is pdfTeX, Version 3.141592653-2.6-1.40.25 (MiKTeX 23.4) (preloaded format=pdflatex 2025.10.23)  5 FEB 2026 17:24
 entering extended mode
 restricted \write18 enabled.
 %&-line parsing enabled.
@@ -743,7 +743,7 @@ urier/ucrr8a.pfb><D:/software/ctex/MiKTeX/fonts/type1/urw/times/utmb8a.pfb><D:/
 software/ctex/MiKTeX/fonts/type1/urw/times/utmbi8a.pfb><D:/software/ctex/MiKTeX
 /fonts/type1/urw/times/utmr8a.pfb><D:/software/ctex/MiKTeX/fonts/type1/urw/time
 s/utmri8a.pfb>
-Output written on rs_retrieval.pdf (17 pages, 2532803 bytes).
+Output written on rs_retrieval.pdf (17 pages, 2533024 bytes).
 PDF statistics:
 440 PDF objects out of 1000 (max. 8388607)
 0 named destinations out of 1000 (max. 500000)
--- a/rs_retrieval.pdf
+++ b/rs_retrieval.pdf
--- a/rs_retrieval.synctex.gz
+++ b/rs_retrieval.synctex.gz
--- a/rs_retrieval.tex
+++ b/rs_retrieval.tex
@@ -65,7 +65,7 @@ Remote sensing data management, Spatio-temporal range retrievals, I/O-aware inde
 Existing RS data management systems \cite{LEWIS17datacube, Yan21RS_manage1, liu24mstgi} typically decompose a spatio-temporal range retrieval into a decoupled two-phase execution model. The first phase is the metadata filtering phase, which utilizes spatio-temporal metadata (e.g., footprints, timestamps) to identify candidate image files that intersect the retrieval predicate. Recent advancements have transitioned from traditional tree-based indexes \cite{Strobl08PostGIS, Simoes16PostGIST} to scalable distributed schemes based on grid encodings and space-filling curves, such as GeoHash \cite{suwardi15geohash}, GeoSOT \cite{Yan21RS_manage1}, and GeoMesa \cite{hughes15geomesa, Li23TrajMesa}. By leveraging these high-dimensional indexing structures, the search complexity of the first phase has been effectively reduced to $O(\log N)$ or even $O(1)$, making metadata discovery extremely efficient even for billion-scale datasets.
-The second phase is the data extraction phase, where the system reads the actual pixel data from the identified raw image files stored in distributed file systems or object stores. A critical observation in modern high-performance RS analytics is that the primary system bottleneck has fundamentally shifted from the first phase to the second. While the metadata search completes in milliseconds, the end-to-end retrieval latency is now dominated by the massive I/O overhead required to fetch, decompress, and process large-scale raw images. Traditional systems attempted to reduce I/O overhead by pre-slicing tiles and building pyramids (e.g., approaches used in Google Earth Engine \cite{gorelick17GEE} that store metadata in HBase and serve pre-tiled image pyramids), but aggressive tiling increases management complexity and produces many small files. More recent Cloud-Optimized GeoTIFF (COG) formats and COG-aware frameworks \cite{LEWIS17datacube}, \cite{riotiler25riotiler} exploit internal overviews and window-based I/O to read only the portions of files that spatially intersect a retrieval. 
+The second phase is the data extraction phase, where the system reads the actual pixel data from the identified raw image files stored in distributed file systems or object stores. A critical observation in modern high-performance RS analytics is that the primary system bottleneck has fundamentally shifted from the first phase to the second. While the metadata search completes in milliseconds, the end-to-end retrieval latency is now dominated by the massive I/O overhead required to fetch, decompress, and process large-scale raw images. Traditional systems attempted to reduce I/O overhead by pre-slicing tiles and building pyramids (e.g., approaches used in Google Earth Engine \cite{gorelick17GEE} that store metadata in HBase and serve pre-tiled image pyramids), but aggressive tiling increases management complexity and produces many small files. More recent Cloud-Optimized GeoTIFF (COG) formats and COG-aware frameworks \cite{LEWIS17datacube}, \cite{riotiler25riotiler} exploit internal overviews and window-based I/O to minimize I/O access. Window-based I/O, also known as partial data retrieval, refers to the technique of reading only a specific spatial subset of a raster dataset rather than loading the entire file into memory. This is achieved by defining a window (bounding box) that specifies the pixel coordinates corresponding to the geographical area of interest.
 While window-based I/O effectively reduces raw data transfer, it introduces a new computational burden due to the decoupling of logical indexing from physical storage. Current systems operate on a ``Search-then-Compute-then-Read" model: after identifying candidate files, they must perform fine-grained, per-image geospatial computations at runtime to map retrieval coordinates to precise file offsets and clip boundaries. This runtime geometric resolution becomes computationally prohibitive when processing a large volume of candidate images, often negating the benefits of I/O reduction. Moreover, under concurrent workloads, the lack of coordination among these independent read requests leads to severe I/O contention and storage thrashing, rendering traditional indexing-centric optimizations insufficient for real-time applications.