Skip to content

# CCC1 — Parser test coverage: schema variants #21

@clean6378-max-it

Description

@clean6378-max-it

CCC1 — Parser test coverage: schema variants

Repo: claude-code-chat-browser
Audit ref: claude-cursor.md CCC1 — "Add direct tests for jsonl_parser.py covering schema variants, malformed entries, and exception paths."
Backlog slice: Chen May 5 — Parser test coverage: test schema variants (5 pt, High).

(After opening on GitHub, paste the issue URL here.)


Summary (from audit)

Finding jsonl_parser.py (617 LOC) is the project's structural core and processes untrusted, schema-evolving Claude Code JSONL files. It has zero direct test coverage. _parse_tool_result (~140 LOC) dispatches on key presence across 14 tool result shapes with no tests. Claude Code's schema is undocumented and changes without notice; a field rename or new tool type is currently invisible.
Fix Add a dedicated tests/test_jsonl_parser.py covering all _parse_tool_result dispatch arms, parse_session entry-type dispatch, metadata accumulation, malformed-entry resilience, and _normalize_content / _extract_text / _extract_images helpers.
Effort M
Priority High — parser is zero-tested on untrusted, schema-evolving input. Pairs with CCC2 (CI must exist to run the tests).

What already exists

tests/test_null_usage_tokens.py

Covers one narrow slice of _process_assistant: null token fields in the usage object.

  • TestProcessAssistantNullUsage — 8 cases asserting null token fields don't raise and default to 0.
  • TestParseSessionNullUsage — 2 integration cases via temp file; null cache_read_input_tokens and mixed null/valid entries.
  • TestEstimateCostNullUsage_estimate_cost from session_stats.py (out of scope for this issue).

Gap: Null-usage is the only parser path with any coverage. Every other entry type, content shape, and tool result variant is untested.


tests/test_export_exclusion_filtering.py, tests/test_export_state.py, tests/test_cli_args.py

These cover export filtering, incremental-export state, and CLI argument parity respectively. None touch jsonl_parser.py directly.


Identified gaps

Gap 1: _parse_tool_result — 14 dispatch arms, 0 tests

The function classifies a toolUseResult dict by key presence. Each arm:

Result type Key(s) used for dispatch
bash stdout or stderr
file_edit structuredPatch, or filePath + newString
file_write filePath + content (no patch)
glob filenames (list)
grep mode + numFiles
file_read file (dict with filePath, numLines, content)
web_search query + results
web_fetch url + code
task (message variant) task_id or message
task (retrieval variant) retrieval_status + task
task (completed subagent) agentId + totalDurationMs
task (async launched) agentId + isAsync
todo_write newTodos or oldTodos
user_input questions + answers
plan plan + filePath
unknown fallback

No test exercises any of these. Schema evolution (e.g., codestatusCode in web_fetch) will be silent regressions.


Gap 2: parse_session — entry-type dispatch

parse_session dispatches on entry.get("type") to four handlers. Not tested:

  • A session with only user entries (no assistant).
  • A session with only assistant entries.
  • An entry with an unknown type (silently ignored — should stay silent).
  • isSidechain: true incrementing sidechain_messages.
  • file-history-snapshot type extracting timestamp from snapshot.timestamp.
  • entry_counts accumulation across mixed entry types.
  • Wall-clock time calculation from first_timestamp / last_timestamp.
  • Empty file (zero entries) returning a valid skeleton.

Gap 3: _process_user — metadata and content extraction

  • version, cwd, gitBranch, permissionMode only captured from the first user entry (subsequent ones must not overwrite).
  • toolUseResult images extracted from nested content list.
  • Missing message key (entry.get("message", {})) — must not raise.
  • content as a plain string vs list of typed blocks.

Gap 4: _process_assistant — content shape variants

  • content as a plain string (normalized to [{type:text}]).
  • content as a list of strings (each becomes a text block).
  • Mixed content: text + thinking + tool_use in one message.
  • thinking blocks accumulated as \n\n-joined string.
  • tool_use counting: multiple calls in one message increment total_tool_calls and tool_call_counts correctly.
  • isApiErrorMessage: true increments api_errors without crashing.
  • stop_reason accumulation across multiple entries.
  • cache_creation dict with ephemeral_5m_input_tokens / ephemeral_1h_input_tokens.
  • service_tier added to service_tiers set.
  • model == "<synthetic>" must not be added to models_used.

Gap 5: _track_file_activity — file and command tracking

  • Read tool → files_read set.
  • Write tool → files_created set.
  • Edit tool → files_written set.
  • Bash tool → bash_commands list.
  • WebFetchweb_fetches list (via url key).
  • WebSearchweb_fetches list (via query key).
  • Tool with empty file_path must not add to any set.

Gap 6: _process_system — compact boundary

  • subtype == "compact_boundary" increments compactions and appends to compact_boundaries.
  • Missing compactMetadata must not raise.
  • Other subtypes append a system message without touching compaction metadata.

Gap 7: _normalize_content, _extract_text, _extract_images

  • _normalize_content: plain string, list of strings, list of dicts, mixed list, None/wrong type → empty list.
  • _extract_text: only type == "text" blocks contribute; tool_use and thinking blocks ignored.
  • _extract_images: base64 image blocks extracted; nested images inside tool_result content blocks extracted; non-image blocks skipped.

Gap 8: _infer_title and _strip_system_tags

  • _infer_title: first user message with text → truncated to 100 chars. No text messages → "Untitled Session". Sidechain-only session.
  • _strip_system_tags: each tag variant removed (system-reminder, ide_opened_file, user-prompt-submit-hook, etc.). Nested/malformed tags handled gracefully.

Gap 9: malformed / partial entries

  • Line with invalid JSON → silently skipped, parse continues.
  • Entry missing type key → counted in entry_counts only if type present; otherwise ignored.
  • Entry with type: "assistant" but missing message key → msg = {}, no crash.
  • Entry with usage as None or a non-dict → no crash.
  • toolUseResult as null_parse_tool_result returns None.
  • toolUseResult as a string → _parse_tool_result returns None.

Gap 10: quick_session_info

  • Small file (≤10 000 bytes): single-pass only, title and timestamps from first 80 lines.
  • Large file (>10 000 bytes): tail-read path finds last timestamp correctly.
  • File with no user entries → title is "Untitled Session".
  • File with only system entries → no crash, both timestamps from system lines.

Proposed test cases

All tests belong in tests/test_jsonl_parser.py. Use tempfile.NamedTemporaryFile (as in test_null_usage_tokens.py) for integration tests; call helpers directly for unit tests.

TestParseToolResult
  test_bash_with_stdout
  test_bash_with_stderr_only
  test_bash_with_exit_code_and_interrupted
  test_file_edit_with_structured_patch
  test_file_edit_with_old_new_string
  test_file_write_content
  test_glob_result
  test_glob_truncated
  test_grep_result
  test_file_read_result
  test_web_search_result
  test_web_fetch_result
  test_task_message_variant
  test_task_retrieval_variant
  test_task_completed_subagent
  test_task_async_launched
  test_todo_write_result
  test_user_input_result
  test_plan_result
  test_unknown_fallback
  test_non_dict_returns_none
  test_slug_preserved

TestNormalizeContent
  test_plain_string
  test_list_of_strings
  test_list_of_dicts
  test_mixed_string_and_dict
  test_none_returns_empty
  test_wrong_type_returns_empty

TestExtractText
  test_text_blocks_joined
  test_tool_use_blocks_ignored
  test_thinking_blocks_ignored
  test_empty_content

TestExtractImages
  test_base64_image_extracted
  test_nested_tool_result_image_extracted
  test_non_image_skipped

TestInferTitle
  test_first_user_message_used
  test_truncated_to_100_chars
  test_no_text_messages_returns_untitled
  test_sidechain_only_returns_untitled

TestStripSystemTags
  test_system_reminder_removed
  test_ide_opened_file_removed
  test_user_prompt_submit_hook_removed
  test_remaining_known_opening_closing_tags_stripped
  test_clean_text_unchanged

TestProcessUser
  test_metadata_captured_from_first_entry_only
  test_missing_message_key_no_crash
  test_tool_use_result_images_extracted

TestProcessAssistant
  test_synthetic_model_not_added
  test_thinking_blocks_joined
  test_tool_use_counts_accumulated
  test_api_error_flag_increments_api_errors
  test_stop_reason_accumulated
  test_service_tier_added
  test_ephemeral_cache_tokens_accumulated

TestTrackFileActivity
  test_read_tool_adds_to_files_read
  test_write_tool_adds_to_files_created
  test_edit_tool_adds_to_files_written
  test_bash_command_appended
  test_web_fetch_url_appended
  test_web_search_query_appended
  test_empty_file_path_not_added

TestProcessSystem
  test_compact_boundary_increments_compaction
  test_compact_boundary_missing_metadata_no_crash
  test_other_subtype_no_compaction_increment

TestParseSession (integration)
  test_empty_file_returns_skeleton
  test_unknown_entry_type_silently_ignored
  test_is_sidechain_increments_counter
  test_file_history_snapshot_timestamp
  test_entry_counts_accumulated
  test_wall_time_computed
  test_invalid_json_line_skipped
  test_missing_type_key_no_crash
  test_missing_usage_dict_no_crash

TestQuickSessionInfo
  test_small_file_title_and_timestamps
  test_large_file_last_timestamp_from_tail
  test_no_user_entries_returns_untitled

Done when

  • tests/test_jsonl_parser.py created in claude-code-chat-browser/tests/.
  • All 14 _parse_tool_result dispatch arms have at least one passing test.
  • Malformed / partial entry cases (invalid JSON, missing keys, wrong types) have at least one passing test each.
  • _normalize_content, _extract_text, _extract_images unit-tested for all input shapes.
  • parse_session integration tests cover empty file, unknown type, sidechain counter, and wall-time.
  • quick_session_info small-file and large-file paths tested.
  • All new tests pass under pytest (CCC2 CI must run green).
  • No test imports _-prefixed symbols from modules other than jsonl_parser itself (avoids CCC3 breach pattern).

Metadata

Metadata

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions