# AGENTS.md

This file provides guidance to agents when working with code in this repository.

## 项目概述
Python 工具，用于从 Confluence API 获取 HTML 并提取保留布局的文本。

## 项目结构

```
OrbitIn/
├── src/                    # 代码模块目录
│   ├── __init__.py         # 包初始化
│   ├── confluence.py       # Confluence API 客户端
│   ├── extractor.py        # HTML 文本提取器
│   ├── parser.py           # 日志解析器
│   └── database.py         # SQLite3 数据库操作
├── data/                   # 数据目录
│   └── daily_logs.db       # SQLite3 数据库文件
├── fetch_and_process.py    # CLI 入口
├── AGENTS.md               # AI助手文档
└── layout_output.txt       # 缓存的布局文本
```

## 核心模块

### [`ConfluenceClient`](src/confluence.py:9)
- `fetch_content(content_id, expand)` - 获取页面内容
- `get_html(content_id)` - 获取 HTML 字符串

### [`HTMLTextExtractor`](src/extractor.py:12)
- `extract(html)` - 从 HTML 提取保留布局的文本
- 使用 `html.parser`（非 lxml）
- 移除带 `ac:name` 属性的 Confluence 宏元素
- 表格格式化使用 `ljust()` 列对齐

### [`HandoverLogParser`](src/parser.py:18)
- `parse(text)` - 解析日志文本，返回 `ShipLog` 列表
- `ShipLog` 数据类：date, shift, ship_name, teu, efficiency, vehicles

### [`DailyLogsDatabase`](src/database.py:13)
- `insert(log)` - 插入单条记录
- `insert_many(logs)` - 批量插入
- `query_by_date(date)` - 按日期查询
- `query_by_ship(ship_name)` - 按船名查询
- `query_all(limit)` - 查询所有
- `get_stats()` - 获取统计信息

## 文本格式约定

- 列表前缀：`•` 用于 `ul`，数字+点用于 `ol`
- 粗体使用 `**text**`，斜体使用 `*text*`
- 水平线使用 `─` (U+2500) 字符
- 链接渲染为 `text (url)`

## 命令

```bash
# 带数据库存储运行（默认）
python3 fetch_and_process.py

# 不存储到数据库
python3 fetch_and_process.py --no-db

# 测试解析模块
python3 -c "from src.parser import HandoverLogParser; p = HandoverLogParser(); print(p.parse(open('layout_output.txt').read())[:3])"
```