Add cc-query-chat skill for searching past sessions

Companion to cc-compact: resolves a session by id or name, then searches across user prompts, agent replies, commands, file ops, tool calls, and tool output via regexp, glob, or ranked keyword matching. Bounded output (centred snippets, --limit/--truncate tuned to ~2k tokens). cc-compact now points to it for closing concrete gaps the compact report leaves open.
2026-06-17 04:45:20 -05:00 · 2026-06-07 14:22:00 -05:00 · 2026-06-07 14:22:00 -05:00 · b27d93727d
parent b0070920e3
commit b27d93727d
3 changed files with 539 additions and 0 deletions
--- a/claude/skills/cc-compact/SKILL.md
+++ b/claude/skills/cc-compact/SKILL.md
@ -62,3 +62,18 @@ diving in, decide whether you actually know how to proceed:
  before acting. Surface the few plausible next steps you inferred and let them
  pick or correct you. If there is any doubt, opt to ask the user first.
 ## If the compact report left something unclear
 Compaction is lossy. If — and **only if absolutely necessary** — a specific
 detail you need to proceed is missing or ambiguous (an exact command that ran,
 a file path, what a tool returned, the precise wording of an earlier request),
 search the full session with the companion **cc-query-chat** skill instead of
 reading the raw log:
 ```sh
 python3 ~/.claude/skills/cc-query-chat/query_session.py --chat '<same id or name>' --keyword 'the detail you need'
 ```
 The compact report is meant to stand on its own — reach for this only to close a
 concrete gap, never to re-read the session.
--- a/claude/skills/cc-query-chat/SKILL.md
+++ b/claude/skills/cc-query-chat/SKILL.md
@ -0,0 +1,81 @@
 ---
 name: cc-query-chat
 description: Search a past Claude Code session by id or name — across prompts, replies, commands, files, and tool output — via regexp, glob, or ranked keyword.
 ---
 You are searching a previous Claude Code session for specific information,
 without reading its whole history. The bundled helper script resolves the
 session, walks it once, and prints only the matching items.
 **Hard rule: you may NOT read the session `.jsonl` file by any means** — no
 `cat`, `head`, `tail`, `jq`, `grep`, `Read`, nothing. Run the helper script and
 work solely from its output. That is the only thing allowed to touch the file.
 The helper lives at `~/.claude/skills/cc-query-chat/query_session.py`.
 ## Step 1: Resolve the session
 The session is identified by id or name, passed via `--chat`:
 It auto-detects whether the parameter is a UUID or title.
 Titles are case insensitive matched.
 If you already know it's a title vs an id, `--title`/`--id`/`--file` also work.
 ## Step 2: Choose a matching scheme
 Exactly one of these is required:
 - **`--regexp PATTERN`** — Python regex, searched anywhere in each item.
  Exact filter: an item either matches or it doesn't.
  ```sh
  ... --chat '<id or name>' --regexp 'ab.*c'
  ```
 - **`--glob PATTERN`** — shell glob (`*`, `?`, `[..]`), searched anywhere in
  each item. Exact filter.
  ```sh
  ... --chat '<id or name>' --glob 'ab*c'
  ```
 - **`--keyword "WORDS"`** — space-separated keywords with partial (substring)
  matches. Results are **ranked**: the more distinct keywords an item matches
  the higher it scores (ties broken by total hit count), best first.
  ```sh
  ... --chat '<id or name>' --keyword 'data pipeline error delivery'
  ```
 regexp and glob are case-insensitive by default (pass `--case-sensitive` to
 change that); keyword is always case-insensitive.
 ## Step 3: Narrow and shape the results
 - **`--type LIST`** — restrict to item types (comma-separated). Types:
  - `user` — the human's prompts
  - `agent` — the assistant's text replies
  - `command` — Bash commands (command + description)
  - `file` — file operations (Read/Write/Edit/NotebookEdit): path + content
  - `tool` — other tool calls (name + input)
  - `output` — tool results: command output, file contents read back, etc.
  e.g. `--type command,output` to search only what was run and what it printed.
 - **`--limit N`** — max results (default 5).
 - **`--truncate N`** — max chars of matched text per result; the shown snippet
  is centred on the match.
 The defaults are tuned so that output is short enough to note pollute context.
 Do not raise them without good reason. If raising `truncate`, you might want to lower `limit` and vice versa.
 ## Step 4: Read the output
 It prints an XML-tagged report: a header with how many items were searched and
 matched, then one `<result>` per hit with its `type`, `turn` number, timestamp,
 keyword score (for keyword mode), a `<label>` (e.g. the command or file path),
 and the matched `<match>` snippet.
 Iterate if needed — refine the pattern, switch schemes (keyword to discover,
 regexp/glob to pinpoint), add a `--type` filter, or raise `--limit`/`--truncate`
 — rather than reading the raw log. Avoid over-querying, if you don't find
 what you're looking for after a few queries then stop.
 If you must dig into queries deeply, start an agent prompted with this tool
 to dig into the query without polluting your own context.
--- a/claude/skills/cc-query-chat/query_session.py
+++ b/claude/skills/cc-query-chat/query_session.py
@ -0,0 +1,443 @@
 #!/usr/bin/env python3
 """Search a Claude Code session log without ingesting its entire history.
 This is the search companion to cc-compact. Where cc-compact gives a bounded
 overview of a session, this lets you *search* one — across everything the
 session contains:
  - user     the human's typed prompts
  - agent    the assistant's text replies
  - command  Bash commands (the command + its description)
  - file     file operations (Read/Write/Edit/NotebookEdit) — path + content
  - tool     other tool calls (the tool name + its input)
  - output   tool results — command output, file contents read back, search
             hits, etc. (labelled with the tool/arg that produced them)
 Three matching schemes (pick one):
  --regexp PATTERN   Python regular expression, searched anywhere in the item
  --glob PATTERN     shell-style glob (*, ?, [..]), searched anywhere
  --keyword "a b c"  space-separated keywords; partial (substring) matches
                     count. Items are ranked: more distinct keywords matched =
                     better, ties broken by total hit count. Only items that
                     match at least one keyword are returned.
 regexp/glob are exact filters — an item is either a match or it isn't.
 keyword is fuzzy and ranked.
 Session resolution (pick one):
  --chat TEXT        a session id (UUID) OR a substring of its ai-title.
                     Auto-detected: looks like a UUID -> treated as id.
  --file PATH        use this JSONL file directly
  --id UUID          find <UUID>.jsonl under the projects dir
  --title TEXT       find the session whose ai-title contains TEXT
 Other options:
  --type LIST        restrict to item types (comma-separated): any of
                     user,agent,command,file,tool,output
  --limit N          max results to show (default 5)
  --truncate N       max chars of matched text shown per result; the snippet
                     is centred on the match so you see the relevant part
                     (default 900 — tuned for a ~2k token total)
  --case-sensitive   make regexp/glob matching case-sensitive (default: not)
 Sessions live at: ~/.claude/projects/<encoded-cwd>/<session-id>.jsonl
 """
 import argparse
 import fnmatch  # noqa: F401  (kept for reference; we hand-roll glob->regex)
 import glob
 import json
 import os
 import re
 import sys
 PROJECTS_DIR = os.path.expanduser("~/.claude/projects")
 UUID_RE = re.compile(r"^[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}$", re.I)
 # Tools that touch files on disk, and which input field holds the path.
 FILE_TOOLS = {
    "Edit": "file_path",
    "Write": "file_path",
    "MultiEdit": "file_path",
    "NotebookEdit": "notebook_path",
    "Read": "file_path",
 }
 ALL_TYPES = ["user", "agent", "command", "file", "tool", "output"]
 # --------------------------------------------------------------------------- #
 # text helpers
 # --------------------------------------------------------------------------- #
 def esc(text):
    """Escape XML metacharacters for tag values and attributes."""
    return str(text).replace("&", "&amp;").replace("<", "&lt;").replace(">", "&gt;")
 def oneline(text, n):
    """Collapse whitespace to a single line, then clip to n chars."""
    text = " ".join(str(text).split())
    return text if len(text) <= n else text[: n - 1] + "…"
 def content_to_text(content):
    """A message's .content is either a string or a list of typed blocks."""
    if isinstance(content, str):
        return content
    if isinstance(content, list):
        parts = []
        for b in content:
            if isinstance(b, dict) and b.get("type") == "text":
                parts.append(b.get("text", ""))
        return "\n".join(p for p in parts if p)
    return ""
 def tool_result_text(content):
    """tool_result .content is a string or a list of {type,text|...} blocks."""
    if isinstance(content, str):
        return content
    if isinstance(content, list):
        parts = []
        for b in content:
            if not isinstance(b, dict):
                continue
            if b.get("type") == "text":
                parts.append(b.get("text", ""))
            elif b.get("type") == "image":
                parts.append("[image]")
        return "\n".join(p for p in parts if p)
    return ""
 def glob_to_regex(pat):
    """Translate a shell glob to a regex fragment we can re.search() anywhere.
    Unlike fnmatch (which anchors and matches the *whole* string), this is
    meant to find the pattern somewhere inside long, multi-line text — so
    `ab*c` matches any text containing 'ab' ... 'c'.
    """
    out = []
    i, n = 0, len(pat)
    while i < n:
        c = pat[i]
        if c == "*":
            out.append(".*")
        elif c == "?":
            out.append(".")
        elif c == "[":
            j = i + 1
            if j < n and pat[j] in "!^":
                j += 1
            if j < n and pat[j] == "]":
                j += 1
            while j < n and pat[j] != "]":
                j += 1
            if j >= n:
                out.append(r"\[")  # unterminated class -> literal
            else:
                stuff = pat[i + 1 : j]
                if stuff.startswith("!"):
                    stuff = "^" + stuff[1:]
                out.append("[" + stuff + "]")
                i = j
        else:
            out.append(re.escape(c))
        i += 1
    return "".join(out)
 def snippet(text, start, end, n):
    """Return up to n chars of `text` centred on the span [start, end),
    with ellipses marking where it was clipped. Newlines preserved."""
    text = text.strip("\n")
    if len(text) <= n:
        return text
    span = end - start
    if span >= n:
        return text[start : start + n].rstrip() + " …[truncated]"
    pad = (n - span) // 2
    lo = max(0, start - pad)
    hi = min(len(text), lo + n)
    lo = max(0, hi - n)
    out = text[lo:hi].strip()
    if lo > 0:
        out = "…" + out
    if hi < len(text):
        out = out + "…"
    return out
 # --------------------------------------------------------------------------- #
 # session resolution
 # --------------------------------------------------------------------------- #
 def find_by_title(needle):
    needle = needle.lower()
    candidates = []  # (mtime, path, title)
    for path in glob.glob(os.path.join(PROJECTS_DIR, "**", "*.jsonl"), recursive=True):
        title = None
        try:
            with open(path, encoding="utf-8") as fh:
                for line in fh:
                    if '"ai-title"' not in line:
                        continue
                    try:
                        rec = json.loads(line)
                    except json.JSONDecodeError:
                        continue
                    if rec.get("type") == "ai-title":
                        t = rec.get("aiTitle", "")
                        if needle in t.lower():
                            title = t
                            break
        except OSError:
            continue
        if title is not None:
            candidates.append((os.path.getmtime(path), path, title))
    if not candidates:
        sys.exit(f"No session whose ai-title contains {needle!r}")
    candidates.sort(reverse=True)
    if len(candidates) > 1:
        sys.stderr.write("Multiple title matches (using newest):\n")
        for _, path, title in candidates:
            sys.stderr.write(f"  {path}  —  {title}\n")
    return candidates[0][1]
 def find_by_id(uuid):
    matches = glob.glob(os.path.join(PROJECTS_DIR, "**", f"{uuid}.jsonl"), recursive=True)
    if not matches:
        sys.exit(f"No session file found for id {uuid} under {PROJECTS_DIR}")
    return matches[0]
 def resolve_file(args):
    if args.file:
        return os.path.expanduser(args.file)
    if args.id:
        return find_by_id(args.id)
    if args.title:
        return find_by_title(args.title)
    if args.chat:
        chat = args.chat.strip()
        if UUID_RE.match(chat):
            return find_by_id(chat)
        return find_by_title(chat)
    sys.exit("Provide one of --chat, --file, --id, or --title")
 # --------------------------------------------------------------------------- #
 # item extraction
 # --------------------------------------------------------------------------- #
 def extract_items(path):
    """Walk the log once and produce a flat, ordered list of searchable items.
    Each item: {"type", "turn", "ts", "label", "text"}.
    """
    items = []
    turn = 0
    tool_meta = {}  # tool_use_id -> (tool_name, short_arg)
    def add(itype, label, text, ts):
        if not text or not text.strip():
            return
        items.append({"type": itype, "turn": turn, "ts": ts, "label": label, "text": text})
    with open(path, encoding="utf-8") as fh:
        for line in fh:
            line = line.strip()
            if not line:
                continue
            try:
                rec = json.loads(line)
            except json.JSONDecodeError:
                continue
            rtype = rec.get("type")
            ts = rec.get("timestamp")
            msg = rec.get("message") if isinstance(rec.get("message"), dict) else {}
            content = msg.get("content")
            if rtype == "user" and not rec.get("isMeta"):
                # Genuine prompt = plain string that isn't a wrapper/interrupt.
                if isinstance(content, str):
                    s = content.lstrip()
                    if s and not s.startswith(("<", "[Request interrupted", "Caveat:")):
                        turn += 1
                        add("user", "user prompt", content, ts)
                # tool_result blocks live inside user messages
                if isinstance(content, list):
                    for b in content:
                        if isinstance(b, dict) and b.get("type") == "tool_result":
                            tid = b.get("tool_use_id")
                            name, arg = tool_meta.get(tid, ("tool", ""))
                            label = f"{name} {arg}".strip()
                            add("output", f"output ← {label}", tool_result_text(b.get("content")), ts)
            elif rtype == "assistant" and isinstance(content, list):
                texts = []
                for b in content:
                    if not isinstance(b, dict):
                        continue
                    bt = b.get("type")
                    if bt == "text" and b.get("text", "").strip():
                        texts.append(b["text"])
                    elif bt == "tool_use":
                        name = b.get("name", "?")
                        inp = b.get("input", {}) or {}
                        tid = b.get("id")
                        if name == "Bash":
                            cmd = str(inp.get("command", ""))
                            desc = str(inp.get("description", ""))
                            body = cmd if not desc else f"{cmd}\n# {desc}"
                            add("command", oneline(cmd, 120), body, ts)
                            tool_meta[tid] = (name, oneline(cmd, 60))
                        elif name in FILE_TOOLS:
                            fp = str(inp.get(FILE_TOOLS[name], ""))
                            pieces = [f"[{name}] {fp}"]
                            for k in ("content", "old_string", "new_string"):
                                if inp.get(k):
                                    pieces.append(str(inp[k]))
                            add("file", f"{name} {fp}", "\n".join(pieces), ts)
                            tool_meta[tid] = (name, oneline(os.path.basename(fp) or fp, 60))
                        else:
                            try:
                                body = json.dumps(inp, ensure_ascii=False, indent=2)
                            except (TypeError, ValueError):
                                body = str(inp)
                            add("tool", name, f"[{name}]\n{body}", ts)
                            arg = inp.get("file_path") or inp.get("path") or inp.get("query") or ""
                            tool_meta[tid] = (name, oneline(str(arg), 60))
                if texts:
                    add("agent", "agent reply", "\n".join(texts), ts)
    return items
 # --------------------------------------------------------------------------- #
 # matching
 # --------------------------------------------------------------------------- #
 def match_regex(items, pattern, flags):
    try:
        rx = re.compile(pattern, flags | re.DOTALL)
    except re.error as e:
        sys.exit(f"Invalid regular expression: {e}")
    results = []
    for it in items:
        m = rx.search(it["text"])
        if m:
            results.append((it, m.start(), m.end(), None))
    return results
 def match_keyword(items, query, flags):
    words = [w for w in query.split() if w]
    if not words:
        sys.exit("--keyword needs at least one word")
    ci = flags & re.IGNORECASE
    scored = []
    for it in items:
        text = it["text"]
        hay = text.lower() if ci else text
        distinct = 0
        total = 0
        first_pos = None
        first_len = 0
        for w in words:
            needle = w.lower() if ci else w
            cnt = hay.count(needle)
            if cnt:
                distinct += 1
                total += cnt
                pos = hay.find(needle)
                if first_pos is None or pos < first_pos:
                    first_pos = pos
                    first_len = len(needle)
        if distinct:
            scored.append((distinct, total, it, first_pos, first_pos + first_len))
    # Best first: more distinct keywords, then more total hits.
    scored.sort(key=lambda r: (-r[0], -r[1]))
    out = []
    for distinct, total, it, s, e in scored:
        out.append((it, s, e, f"{distinct}/{len(words)} kw, {total} hits"))
    return out
 # --------------------------------------------------------------------------- #
 # main
 # --------------------------------------------------------------------------- #
 def main():
    ap = argparse.ArgumentParser(
        description=__doc__, formatter_class=argparse.RawDescriptionHelpFormatter
    )
    # session resolution
    ap.add_argument("--chat", help="session id (UUID) or substring of its ai-title")
    ap.add_argument("--file", help="path to a session .jsonl")
    ap.add_argument("--id", help="session UUID")
    ap.add_argument("--title", help="substring of the session's ai-title")
    # match scheme (exactly one)
    mode = ap.add_mutually_exclusive_group(required=True)
    mode.add_argument("--regexp", metavar="PATTERN", help="regular-expression search")
    mode.add_argument("--glob", metavar="PATTERN", help="shell-glob search")
    mode.add_argument("--keyword", metavar="WORDS", help="ranked keyword search")
    # filters / shaping
    ap.add_argument("--type", help="restrict to item types (comma-separated): " + ",".join(ALL_TYPES))
    ap.add_argument("--limit", type=int, default=5, help="max results (default 5)")
    ap.add_argument("--truncate", type=int, default=900, help="max chars of matched text per result (default 900)")
    ap.add_argument("--case-sensitive", action="store_true", help="case-sensitive regexp/glob (default off)")
    args = ap.parse_args()
    path = resolve_file(args)
    wanted = None
    if args.type:
        wanted = {t.strip().lower() for t in args.type.split(",") if t.strip()}
        bad = wanted - set(ALL_TYPES)
        if bad:
            sys.exit(f"Unknown --type value(s): {', '.join(sorted(bad))}. Choose from {', '.join(ALL_TYPES)}")
    items = extract_items(path)
    if wanted:
        items = [it for it in items if it["type"] in wanted]
    flags = 0 if args.case_sensitive else re.IGNORECASE
    if args.regexp is not None:
        scheme, query = "regexp", args.regexp
        results = match_regex(items, args.regexp, flags)
    elif args.glob is not None:
        scheme, query = "glob", args.glob
        results = match_regex(items, glob_to_regex(args.glob), flags)
    else:
        scheme, query = "keyword", args.keyword
        # keyword is always case-insensitive for friendliness
        results = match_keyword(items, args.keyword, re.IGNORECASE)
    shown = results[: args.limit]
    out = sys.stdout.write
    out(
        f'<chat-query mode="{scheme}" query="{esc(query)}" '
        f'searched-items="{len(items)}" matched="{len(results)}" showing="{len(shown)}">\n'
    )
    out(f"  <file>{esc(path)}</file>\n")
    if not shown:
        out("  <no-matches />\n")
    for n, (it, s, e, note) in enumerate(shown, 1):
        attrs = f'n="{n}" type="{it["type"]}" turn="{it["turn"]}"'
        if note:
            attrs += f' score="{esc(note)}"'
        if it["ts"]:
            attrs += f' ts="{esc(it["ts"])}"'
        out(f"  <result {attrs}>\n")
        out(f"    <label>{esc(oneline(it['label'], 200))}</label>\n")
        snip = snippet(it["text"], s if s is not None else 0, e if e is not None else 0, args.truncate)
        out(f"    <match>\n{esc(snip)}\n    </match>\n")
        out("  </result>\n")
    out("</chat-query>\n")
 if __name__ == "__main__":
    main()