Files
rs-retrieval/CLAUDE.md
2026-02-02 20:13:50 +08:00

183 lines
6.0 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
## Project Overview
This is a LaTeX research paper titled "An I/O-Efficient Approach for Concurrent Spatio-Temporal Range Retrievals over Large-Scale Remote Sensing Image Data" submitted to an IEEE journal. The paper proposes novel techniques for efficient retrieval of remote sensing imagery, including:
- **Index-as-an-Execution-Plan paradigm**: Integrates fine-grained partial retrieval directly into indexing structure
- **Dual-layer inverted index (G2I/I2G)**: Pre-materializes grid-to-pixel mappings to eliminate runtime geometric calculations
- **Hybrid concurrency-aware I/O coordination**: Combines Calvin-style deterministic ordering with optimistic execution
- **SA-GMAB (Surrogate-Assisted Genetic Multi-Armed Bandit)**: Auto-tuning mechanism for fluctuating workloads
## Build and Compilation
### Primary Build Commands
```bash
# Standard compilation (recommended for IEEE format)
pdflatex rs_retrieval.tex
# Alternative compilation (being tested)
xelatex rs_retrieval.tex
# Full build cycle (includes bibliography)
pdflatex rs_retrieval.tex
bibtex rs_retrieval
pdflatex rs_retrieval.tex
pdflatex rs_retrieval.tex
```
### Build System
- **Compiler**: pdfTeX (MiKTeX distribution on Windows)
- **Document Class**: IEEEtran (IEEE journal format)
- **Output**: 15-page PDF (~2.36MB)
- **No automation**: No Makefile or build scripts - use manual compilation
## Document Structure
The paper follows standard IEEE journal organization with these main sections:
1. **Introduction** - Motivation and problem statement
2. **Related Work** - Spatio-temporal retrieval, concurrency control, I/O tuning
3. **Problem Formulation** - Mathematical definitions and cost models
4. **I/O-aware Indexing Structure** (Section 3) - Core technical contribution
- Grid-to-Image (G2I) index
- Image-to-Grid (I2G) index
- Pre-materialized execution plans
5. **Hybrid Concurrency-Aware I/O Coordination** (Section 4)
- Deterministic vs optimistic execution modes
- Adaptive mode switching
6. **I/O Stack Tuning** (Section 5) - SA-GMAB algorithm
7. **Performance Evaluation** (Section 6) - Experimental results on Martian datasets
8. **Conclusions** - Summary of contributions
### Key Files
- `rs_retrieval.tex` - Main LaTeX source (single-file document)
- `references.bib` - Bibliography database
- `fig/` - Figures directory (index.png, st-query.png, cc.png)
- `exp/` - Experimental results (PDF charts)
## LaTeX Package Dependencies
### Required Packages
```latex
\documentclass[lettersize,journal]{IEEEtran}
\usepackage{amsmath,amsfonts} % Mathematics
\usepackage{graphicx} % Figures
\usepackage[linesnumbered,lined,ruled]{algorithm2e} % Algorithms
\usepackage{cite} % Citations
\usepackage{array} % Table formatting
\usepackage{makecell} % Table cells
\usepackage{subfigure} % Subfigures
```
### Chinese Language Support
- The project directory name includes Chinese characters (遥感影像部分检索)
- Document content is in English
- Uses ctex distribution (Chinese TeX) on the system
## Document Conventions
### Cross-References
All sections use `\label{}` and `\ref{}` for cross-referencing:
- Section labels: `sec:XX` format (e.g., `\label{sec:Index}`)
- Algorithm labels: `alg:XX` format
- Figure labels: `fig:XX` format
- Equation labels: `eq:XX` format
### Mathematical Notation
- Extensive use of mathematical formulations
- Cost models use notation: $C_{total}$, $T_{compute}$, etc.
- Algorithm pseudo-code uses algorithm2e package
### Citation Style
- IEEE citation style with numeric references
- Citations in format: `\cite{AuthorYearKEY}`
- Bibliography managed in `references.bib`
### Figure Organization
Figures are organized by topic:
- `fig/index.png` - Index schema design
- `fig/st-query.png` - Retrieval-time execution flow
- `fig/cc.png` - Concurrency coordination mechanism
## Common Editing Tasks
### Adding a New Section
1. Add `\section{Section Name}` with `\label{sec:NAME}`
2. Update the table of contents/organization paragraph in Introduction
3. Ensure cross-references use correct label format
### Modifying Algorithms
- Use `algorithm2e` environment
- Keep `linesnumbered,lined,ruled` options for consistency
- Label with `\label{alg:NAME}` for referencing
### Adding Figures
1. Place figure files in `fig/` directory
2. Use `\begin{figure}[t]` for top placement (IEEE convention)
3. Include `\caption{}` and `\label{fig:NAME}`
4. Refer using `\ref{fig:NAME}`
### Bibliography Updates
1. Add entries to `references.bib`
2. Use BibTeX key format: `AuthorYearKEY` (e.g., `Ma15RS_bigdata`)
3. Run `bibtex rs_retrieval` after modifying .bib file
4. Compile LaTeX twice to resolve references
## Important Notes
### Compilation Workflow
When making changes that affect:
- **Text only**: Single `pdflatex` run sufficient
- **Citations**: Run `pdflatex``bibtex``pdflatex` × 2
- **New sections/labels**: Run `pdflatex` twice to resolve cross-references
- **Figures**: Ensure all figure files exist before compilation
### Git Repository
- Main branch: `main`
- Recent activity: Testing XeLaTeX compilation
- Modified files tracked: .tex, .pdf, .aux, .log, .synctex.gz
### Document Formatting
- Strict IEEE journal format compliance
- Font: Times Roman family
- Two-column layout
- Letter size paper
- 15-page final document
### Known Issues
- Some font variants (bold/italic) unavailable in current TeX distribution
- Testing migration from pdflatex to xelatex (commit f7ffed8)
## Experimental Data Reference
The paper evaluates on Martian remote sensing datasets:
- **Total volume**: 51.9 TB across 669,641 images
- **Datasets**: MoRIC, CTX, THEMIS, HiRISE
- **Environment**: 9-node cluster with HBase and Lustre file system
- **Metrics**: Latency, I/O throughput, request collapse efficiency
Results show:
- Order-of-magnitude latency reduction with I/O-aware indexing
- 54x speedup under high contention with hybrid coordination
- 2x faster recovery from workload shifts with SA-GMAB