# CCC1 — Parser test coverage: schema variants

# CCC1 — Parser test coverage: schema variants

**Repo:** `claude-code-chat-browser`  
**Audit ref:** `claude-cursor.md` **CCC1** — "Add **direct tests** for `jsonl_parser.py` covering schema variants, malformed entries, and exception paths."  
**Backlog slice:** Chen May 5 — *Parser test coverage: test schema variants* (5 pt, High).

*(After opening on GitHub, paste the issue URL here.)*

---

## Summary (from audit)

| | |
| --- | --- |
| **Finding** | `jsonl_parser.py` (617 LOC) is the project's structural core and processes untrusted, schema-evolving Claude Code JSONL files. It has **zero direct test coverage**. `_parse_tool_result` (~140 LOC) dispatches on key presence across 14 tool result shapes with no tests. Claude Code's schema is undocumented and changes without notice; a field rename or new tool type is currently invisible. |
| **Fix** | Add a dedicated `tests/test_jsonl_parser.py` covering all `_parse_tool_result` dispatch arms, `parse_session` entry-type dispatch, metadata accumulation, malformed-entry resilience, and `_normalize_content` / `_extract_text` / `_extract_images` helpers. |
| **Effort** | M |
| **Priority** | High — parser is zero-tested on untrusted, schema-evolving input. Pairs with **CCC2** (CI must exist to run the tests). |

---

## What already exists

### `tests/test_null_usage_tokens.py`

Covers one narrow slice of `_process_assistant`: null token fields in the `usage` object.

- `TestProcessAssistantNullUsage` — 8 cases asserting `null` token fields don't raise and default to 0.
- `TestParseSessionNullUsage` — 2 integration cases via temp file; null `cache_read_input_tokens` and mixed null/valid entries.
- `TestEstimateCostNullUsage` — `_estimate_cost` from `session_stats.py` (out of scope for this issue).

**Gap:** Null-usage is the only parser path with any coverage. Every other entry type, content shape, and tool result variant is untested.

---

### `tests/test_export_exclusion_filtering.py`, `tests/test_export_state.py`, `tests/test_cli_args.py`

These cover export filtering, incremental-export state, and CLI argument parity respectively. None touch `jsonl_parser.py` directly.

---

## Identified gaps

### Gap 1: `_parse_tool_result` — 14 dispatch arms, 0 tests

The function classifies a `toolUseResult` dict by key presence. Each arm:

| Result type | Key(s) used for dispatch |
| --- | --- |
| `bash` | `stdout` or `stderr` |
| `file_edit` | `structuredPatch`, or `filePath` + `newString` |
| `file_write` | `filePath` + `content` (no patch) |
| `glob` | `filenames` (list) |
| `grep` | `mode` + `numFiles` |
| `file_read` | `file` (dict with `filePath`, `numLines`, `content`) |
| `web_search` | `query` + `results` |
| `web_fetch` | `url` + `code` |
| `task` (message variant) | `task_id` or `message` |
| `task` (retrieval variant) | `retrieval_status` + `task` |
| `task` (completed subagent) | `agentId` + `totalDurationMs` |
| `task` (async launched) | `agentId` + `isAsync` |
| `todo_write` | `newTodos` or `oldTodos` |
| `user_input` | `questions` + `answers` |
| `plan` | `plan` + `filePath` |
| `unknown` | fallback |

No test exercises any of these. Schema evolution (e.g., `code` → `statusCode` in web_fetch) will be silent regressions.

---

### Gap 2: `parse_session` — entry-type dispatch

`parse_session` dispatches on `entry.get("type")` to four handlers. Not tested:

- A session with only `user` entries (no assistant).
- A session with only `assistant` entries.
- An entry with an unknown `type` (silently ignored — should stay silent).
- `isSidechain: true` incrementing `sidechain_messages`.
- `file-history-snapshot` type extracting `timestamp` from `snapshot.timestamp`.
- `entry_counts` accumulation across mixed entry types.
- Wall-clock time calculation from `first_timestamp` / `last_timestamp`.
- Empty file (zero entries) returning a valid skeleton.

---

### Gap 3: `_process_user` — metadata and content extraction

- `version`, `cwd`, `gitBranch`, `permissionMode` only captured from the **first** user entry (subsequent ones must not overwrite).
- `toolUseResult` images extracted from nested `content` list.
- Missing `message` key (`entry.get("message", {})`) — must not raise.
- `content` as a plain string vs list of typed blocks.

---

### Gap 4: `_process_assistant` — content shape variants

- `content` as a plain string (normalized to `[{type:text}]`).
- `content` as a list of strings (each becomes a text block).
- Mixed content: `text` + `thinking` + `tool_use` in one message.
- `thinking` blocks accumulated as `\n\n`-joined string.
- `tool_use` counting: multiple calls in one message increment `total_tool_calls` and `tool_call_counts` correctly.
- `isApiErrorMessage: true` increments `api_errors` without crashing.
- `stop_reason` accumulation across multiple entries.
- `cache_creation` dict with `ephemeral_5m_input_tokens` / `ephemeral_1h_input_tokens`.
- `service_tier` added to `service_tiers` set.
- `model == "<synthetic>"` must **not** be added to `models_used`.

---

### Gap 5: `_track_file_activity` — file and command tracking

- `Read` tool → `files_read` set.
- `Write` tool → `files_created` set.
- `Edit` tool → `files_written` set.
- `Bash` tool → `bash_commands` list.
- `WebFetch` → `web_fetches` list (via `url` key).
- `WebSearch` → `web_fetches` list (via `query` key).
- Tool with empty `file_path` must not add to any set.

---

### Gap 6: `_process_system` — compact boundary

- `subtype == "compact_boundary"` increments `compactions` and appends to `compact_boundaries`.
- Missing `compactMetadata` must not raise.
- Other subtypes append a system message without touching compaction metadata.

---

### Gap 7: `_normalize_content`, `_extract_text`, `_extract_images`

- `_normalize_content`: plain string, list of strings, list of dicts, mixed list, `None`/wrong type → empty list.
- `_extract_text`: only `type == "text"` blocks contribute; `tool_use` and `thinking` blocks ignored.
- `_extract_images`: base64 image blocks extracted; nested images inside `tool_result` content blocks extracted; non-image blocks skipped.

---

### Gap 8: `_infer_title` and `_strip_system_tags`

- `_infer_title`: first user message with text → truncated to 100 chars. No text messages → `"Untitled Session"`. Sidechain-only session.
- `_strip_system_tags`: each tag variant removed (`system-reminder`, `ide_opened_file`, `user-prompt-submit-hook`, etc.). Nested/malformed tags handled gracefully.

---

### Gap 9: malformed / partial entries

- Line with invalid JSON → silently skipped, parse continues.
- Entry missing `type` key → counted in `entry_counts` only if `type` present; otherwise ignored.
- Entry with `type: "assistant"` but missing `message` key → `msg = {}`, no crash.
- Entry with `usage` as `None` or a non-dict → no crash.
- `toolUseResult` as `null` → `_parse_tool_result` returns `None`.
- `toolUseResult` as a string → `_parse_tool_result` returns `None`.

---

### Gap 10: `quick_session_info`

- Small file (≤10 000 bytes): single-pass only, title and timestamps from first 80 lines.
- Large file (>10 000 bytes): tail-read path finds last timestamp correctly.
- File with no user entries → title is `"Untitled Session"`.
- File with only `system` entries → no crash, both timestamps from system lines.

---

## Proposed test cases

All tests belong in `tests/test_jsonl_parser.py`. Use `tempfile.NamedTemporaryFile` (as in `test_null_usage_tokens.py`) for integration tests; call helpers directly for unit tests.

```
TestParseToolResult
  test_bash_with_stdout
  test_bash_with_stderr_only
  test_bash_with_exit_code_and_interrupted
  test_file_edit_with_structured_patch
  test_file_edit_with_old_new_string
  test_file_write_content
  test_glob_result
  test_glob_truncated
  test_grep_result
  test_file_read_result
  test_web_search_result
  test_web_fetch_result
  test_task_message_variant
  test_task_retrieval_variant
  test_task_completed_subagent
  test_task_async_launched
  test_todo_write_result
  test_user_input_result
  test_plan_result
  test_unknown_fallback
  test_non_dict_returns_none
  test_slug_preserved

TestNormalizeContent
  test_plain_string
  test_list_of_strings
  test_list_of_dicts
  test_mixed_string_and_dict
  test_none_returns_empty
  test_wrong_type_returns_empty

TestExtractText
  test_text_blocks_joined
  test_tool_use_blocks_ignored
  test_thinking_blocks_ignored
  test_empty_content

TestExtractImages
  test_base64_image_extracted
  test_nested_tool_result_image_extracted
  test_non_image_skipped

TestInferTitle
  test_first_user_message_used
  test_truncated_to_100_chars
  test_no_text_messages_returns_untitled
  test_sidechain_only_returns_untitled

TestStripSystemTags
  test_system_reminder_removed
  test_ide_opened_file_removed
  test_user_prompt_submit_hook_removed
  test_remaining_known_opening_closing_tags_stripped
  test_clean_text_unchanged

TestProcessUser
  test_metadata_captured_from_first_entry_only
  test_missing_message_key_no_crash
  test_tool_use_result_images_extracted

TestProcessAssistant
  test_synthetic_model_not_added
  test_thinking_blocks_joined
  test_tool_use_counts_accumulated
  test_api_error_flag_increments_api_errors
  test_stop_reason_accumulated
  test_service_tier_added
  test_ephemeral_cache_tokens_accumulated

TestTrackFileActivity
  test_read_tool_adds_to_files_read
  test_write_tool_adds_to_files_created
  test_edit_tool_adds_to_files_written
  test_bash_command_appended
  test_web_fetch_url_appended
  test_web_search_query_appended
  test_empty_file_path_not_added

TestProcessSystem
  test_compact_boundary_increments_compaction
  test_compact_boundary_missing_metadata_no_crash
  test_other_subtype_no_compaction_increment

TestParseSession (integration)
  test_empty_file_returns_skeleton
  test_unknown_entry_type_silently_ignored
  test_is_sidechain_increments_counter
  test_file_history_snapshot_timestamp
  test_entry_counts_accumulated
  test_wall_time_computed
  test_invalid_json_line_skipped
  test_missing_type_key_no_crash
  test_missing_usage_dict_no_crash

TestQuickSessionInfo
  test_small_file_title_and_timestamps
  test_large_file_last_timestamp_from_tail
  test_no_user_entries_returns_untitled
```

---

## Done when

- [ ] `tests/test_jsonl_parser.py` created in `claude-code-chat-browser/tests/`.
- [ ] All 14 `_parse_tool_result` dispatch arms have at least one passing test.
- [ ] Malformed / partial entry cases (invalid JSON, missing keys, wrong types) have at least one passing test each.
- [ ] `_normalize_content`, `_extract_text`, `_extract_images` unit-tested for all input shapes.
- [ ] `parse_session` integration tests cover empty file, unknown type, sidechain counter, and wall-time.
- [ ] `quick_session_info` small-file and large-file paths tested.
- [ ] All new tests pass under `pytest` (CCC2 CI must run green).
- [ ] No test imports `_`-prefixed symbols from modules other than `jsonl_parser` itself (avoids CCC3 breach pattern).



Finding	`jsonl_parser.py` (617 LOC) is the project's structural core and processes untrusted, schema-evolving Claude Code JSONL files. It has zero direct test coverage. `_parse_tool_result` (~140 LOC) dispatches on key presence across 14 tool result shapes with no tests. Claude Code's schema is undocumented and changes without notice; a field rename or new tool type is currently invisible.
Fix	Add a dedicated `tests/test_jsonl_parser.py` covering all `_parse_tool_result` dispatch arms, `parse_session` entry-type dispatch, metadata accumulation, malformed-entry resilience, and `_normalize_content` / `_extract_text` / `_extract_images` helpers.
Effort	M
Priority	High — parser is zero-tested on untrusted, schema-evolving input. Pairs with CCC2 (CI must exist to run the tests).

Result type	Key(s) used for dispatch
`bash`	`stdout` or `stderr`
`file_edit`	`structuredPatch`, or `filePath` + `newString`
`file_write`	`filePath` + `content` (no patch)
`glob`	`filenames` (list)
`grep`	`mode` + `numFiles`
`file_read`	`file` (dict with `filePath`, `numLines`, `content`)
`web_search`	`query` + `results`
`web_fetch`	`url` + `code`
`task` (message variant)	`task_id` or `message`
`task` (retrieval variant)	`retrieval_status` + `task`
`task` (completed subagent)	`agentId` + `totalDurationMs`
`task` (async launched)	`agentId` + `isAsync`
`todo_write`	`newTodos` or `oldTodos`
`user_input`	`questions` + `answers`
`plan`	`plan` + `filePath`
`unknown`	fallback

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

# CCC1 — Parser test coverage: schema variants #21

CCC1 — Parser test coverage: schema variants

Summary (from audit)

What already exists

`tests/test_null_usage_tokens.py`

`tests/test_export_exclusion_filtering.py`, `tests/test_export_state.py`, `tests/test_cli_args.py`

Identified gaps

Gap 1: `_parse_tool_result` — 14 dispatch arms, 0 tests

Gap 2: `parse_session` — entry-type dispatch

Gap 3: `_process_user` — metadata and content extraction

Gap 4: `_process_assistant` — content shape variants

Gap 5: `_track_file_activity` — file and command tracking

Gap 6: `_process_system` — compact boundary

Gap 7: `_normalize_content`, `_extract_text`, `_extract_images`

Gap 8: `_infer_title` and `_strip_system_tags`

Gap 9: malformed / partial entries

Gap 10: `quick_session_info`

Proposed test cases

Done when

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

# CCC1 — Parser test coverage: schema variants #21

Description

CCC1 — Parser test coverage: schema variants

Summary (from audit)

What already exists

tests/test_null_usage_tokens.py

tests/test_export_exclusion_filtering.py, tests/test_export_state.py, tests/test_cli_args.py

Identified gaps

Gap 1: _parse_tool_result — 14 dispatch arms, 0 tests

Gap 2: parse_session — entry-type dispatch

Gap 3: _process_user — metadata and content extraction

Gap 4: _process_assistant — content shape variants

Gap 5: _track_file_activity — file and command tracking

Gap 6: _process_system — compact boundary

Gap 7: _normalize_content, _extract_text, _extract_images

Gap 8: _infer_title and _strip_system_tags

Gap 9: malformed / partial entries

Gap 10: quick_session_info

Proposed test cases

Done when

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

`tests/test_null_usage_tokens.py`

`tests/test_export_exclusion_filtering.py`, `tests/test_export_state.py`, `tests/test_cli_args.py`

Gap 1: `_parse_tool_result` — 14 dispatch arms, 0 tests

Gap 2: `parse_session` — entry-type dispatch

Gap 3: `_process_user` — metadata and content extraction

Gap 4: `_process_assistant` — content shape variants

Gap 5: `_track_file_activity` — file and command tracking

Gap 6: `_process_system` — compact boundary

Gap 7: `_normalize_content`, `_extract_text`, `_extract_images`

Gap 8: `_infer_title` and `_strip_system_tags`

Gap 10: `quick_session_info`