Revision history for LLM::Chat

0.5.1  2026-04-29T23:47:12+01:00
    - Bump Github Actions to use node 24+
    - LLM::Chat::Backend::OpenAICommon gains a symmetric
      `_on-blocking-complete` hook on the non-streaming path,
      mirroring the existing `_on-stream-complete` contract: fires
      after the response body has been parsed and `_lift-usage`
      has lifted OAI/provider usage fields, before
      `$response.done`. Default implementation is a no-op.
      Subclasses use it to attach post-call metadata that isn't
      in the body itself — symmetric with what was already
      possible on streams.
    - LLM::Chat::Backend::OpenRouter wires the new hook through
      to the same `/generation?id=...` lookup the streaming path
      already uses, so blocking callers (e.g. App::Storygen, which
      calls `chat-completion` rather than `chat-completion-stream`)
      now see `.cost` populated by the time `$response.done` fires.
      Pre-fix this was the silent regression from dropping
      `usage: { include: true }` — `Response::OpenRouter.cost`
      stayed Nil on every blocking call.
    - LLM::Chat::Backend::OpenRouter refactor: the lookup logic
      is now in a private `!fetch-generation-metadata` helper that
      both `_on-stream-complete` and `_on-blocking-complete`
      delegate to. No behaviour change on the streaming path.
    - Tests — t/12-openrouter-backend.rakutest gains a subtest
      covering the defensive guards on both completion hooks
      (no-op when generation-id is undefined; no crash when the
      response isn't OR-augmented). Plan goes 11 → 12; total
      LLM::Chat tests 130.

0.5.0  2026-04-27T22:52:25+01:00
    - LLM::Chat::Backend::OpenRouter request shape now mirrors
      SillyTavern's wire bytes verbatim. Removed two body fields
      that were causing OpenRouter's upstream router to hold 200 OK
      headers indefinitely against some providers (~80% header-phase
      timeouts in App::Cantina vs ~0% in SillyTavern on the same
      models / keys / network):
        * `usage: { include: true }` — no longer sent.
        * `stream_options: { include_usage: true }` — no longer sent
          (also removed from OpenAICommon.chat-completion-stream so
          all OAI-compatible streams now match this shape).
        * `reasoning: { effort, enabled }` — `enabled` key dropped;
          we now send only `{ effort }` when reasoning_effort is
          configured, matching ST.
      Added on every request, also matching ST:
        * `include_reasoning: Bool` — Boolean parity with ST's flag.
        * `top_k` — plumbed from Settings into the OAI body.
      `repetition_penalty` is now omitted when at the default 1.0
      (was previously sent unconditionally).
    - Cost telemetry that the inline `usage: { include: true }`
      block used to carry now arrives via a one-shot post-stream
      GET against `/generation?id=...` after `[DONE]`. Lookup is
      async and best-effort; on failure $resp.cost stays Nil rather
      than escalating. Lifts cost, provider-name, and (when not
      already populated from the stream) prompt/completion tokens.
      Latency: ~50–200ms after .is-done becomes True before .cost
      is readable. New hook _on-stream-complete on OpenAICommon so
      future provider subclasses can do the same kind of
      post-stream metadata fetch.
    - LLM::Chat::Backend::OpenAICommon stream parser now buffers
      bytes across body-byte-stream emissions and splits on the
      SSE `\n\n` event delimiter before parsing, instead of
      decoding+parsing each TCP chunk independently. Pre-fix, a
      `data: {...}` JSON object split across two TCP packets would
      crash from-json on the truncated half and terminate the
      stream as 'unknown' error class. Both chat-completion-stream
      and text-completion-stream got the fix. Heartbeat / SSE
      comment lines (`: OPENROUTER PROCESSING`) are dropped per
      spec — never produce a chunk.
    - LLM::Chat::Backend::OpenAICommon.!classify-exception no
      longer string-matches "timeout" / "timed out" in the default
      arm; only X::Cro::HTTP::Client::Timeout maps to error-class
      'timeout' now. Substring matching was masking unrelated
      errors (JSON parse failures, stream-cancel messages) as
      header timeouts. Connection-error pattern (refused / reset /
      DNS / unreachable) stayed — those don't have a typed
      exception class to discriminate on.
    - LLM::Chat::Debug log format gains elapsed-ms timestamps for
      streaming requests: HEADERS RECEIVED, FIRST BODY BYTE, and
      EXCEPTION lines all carry "+Nms" relative to the call start
      so latency can be diagnosed without external instrumentation.
      Existing log labels unchanged.

0.3.0  2026-04-23T15:56:28+01:00
    - LLM::Chat::Backend::Response gains structured error metadata:
      $.error-status (Int HTTP code) and $.error-class (Str —
      'http' / 'timeout' / 'connection' / 'response' / 'unknown').
      Populated via _set-error-info(:$class, :$status) alongside
      the existing .quit path. Lets consumers branch on error kind
      without regex-parsing raw messages — used by the
      LLM::Data::Inference::Task model-fallback policy.
    - LLM::Chat::Backend::OpenAICommon CATCH blocks classify Cro
      exceptions (X::Cro::HTTP::Error — picks up status off
      .response.status; X::Cro::HTTP::Client::Timeout) plus
      heuristic socket-error detection into the Response's error
      fields before quitting. Finish-reason quits (length /
      content_filter / unknown) are tagged error-class => 'response'
      so the fallback layer advances on them.
    - LLM::Chat::Backend::Mock gains &.error-producer — an optional
      (Int $call-index --> Hash) callback that scripts per-call
      failures for fallback / retry tests. Returning a hash like
      { class => 'http', status => 500, message => 'x' } fails
      that call without consuming a slot from @.responses. Also
      exposes $.call-index (monotonic per-backend call count,
      bumped on every completion regardless of outcome).

0.2.6  2026-04-13T17:03:42+01:00
    - LLM::Chat::Backend::Mock: new test-only backend that returns
      canned responses in order. Supports streaming (default splits
      on whitespace, configurable via :token-splitter), optional delay
      between tokens via :stream-delay, and an :initial-delay (10ms
      default) so consumers can attach taps before tokens flow.
      :fail-on-empty makes the backend die when the response queue
      is exhausted instead of repeating the last entry. Useful for
      exercising error paths in downstream consumers.
      Recording: every completion call is logged to @.recorded-calls
      as a hash with kind / messages / tools / response / at. Tests
      can assert on what reached the backend, not just what came
      back — catches prompt-assembly and template-substitution
      regressions. clear-recorded-calls resets the log between
      test phases.

0.2.5  2026-04-09T12:30:48+01:00
    - Optional :@tools parameter on chat-completion and chat-completion-stream
    - Response.tool-calls and has-tool-calls for detecting LLM tool call requests
    - Response.finish-reason field

0.2.4  2026-04-09T05:15:41+01:00
    - CI: exclude Windows (Tokenizers Rust FFI build not yet supported)

0.2.3  2026-04-09T05:09:48+01:00
    - Add GitHub Actions CI workflow with Rust toolchain for Tokenizers
    - Add dist.ini for mi6 (UploadToZef, ReadmeFromPod, Badges)
    - Add docs/Readme.rakudoc

0.2.2  2026-04-09T04:59:41+01:00
    - Add stub LLM::Chat to make mi6 stop renaming the module.

0.2.1  2026-04-09T04:41:36+01:00
    - Add LLM::Chat::Template::Jinja2 for HuggingFace chat template support
    - from-tokenizer-config class method loads templates from tokenizer_config.json
    - Supports bos_token, eos_token passthrough
    - Continuation mode maps to add_generation_prompt=false

0.2.0
    - Previous releases
