notes to self

The changelog that writes itself⁠

On this page

Weirdly, I like reading changelogs, but a wall of fix: and chore: has all the charm of a bank statement.

I've been exploring the world of local LLMs for a while, and wanted to see if I could run a small model as part of a CI run in a GitHub Action. I like the idea of not relying on a cloud API, or paying for this kind of unsolicited fun, so I set to putting together an action that downloads a language model onto a CI runner, reads the team's merged PRs over the past week, and writes the summary in the style of a sports commentator. Pretty spicy stuff!

Now, every Friday afternoon, something like this lands in the team channel:

🗞️ The Weekly Scoop


19 June – 26 June   |   71 PRs merged by 9 contributors


🍦 THIS WEEK'S SCOOPS

What a week on the vans. Mabel Scoops was unstoppable in the checkout flow, landing a hat-trick: she sorted the penny-rounding bug that had been short-changing the till, wired up refunds for returned tubs so the books finally balance, and tidied the receipt formatting on the way past. Over in stock, Sam Sprinkles kept the cold chain honest, shipping the low-freezer alerts that page the depot before a van runs dry, plus a refactor of the route planner that shaved real time off the morning dispatch.

Priya Ripple took the defensive honours, hardening the menu API against the lunchtime rush with proper rate limits and caching, while Niall Ninetynine laid the groundwork for the new loyalty scheme. And a quiet word for BrainFreeze, the night-shift bot keeping the docs in step with every merge without being asked.


🤖 Generated by a local LLM running in GitHub Actions. No data leaves the runner.

This is waaaay more fun to read after a full week of shipping code. Here's how to build it:

Ingredients⁠

We need five things:

So the only thing you set up by hand is that one secret, SLACK_WEBHOOK_URL. The GITHUB_TOKEN is automatic, and everything else worth tweaking (the model repo, the model file, and the llama.cpp version) sits in plain env: at the top of the workflow.

Method⁠

We pull the data (i.e. the week's PRs), hand it to a small local model to summarise, then post the result.

1. Schedule it⁠

A workflow that runs on a cron that also offers a manual trigger, so we can test without waiting until Friday.

name: Weekly changelog

on:
  schedule:
    - cron: "0 17 * * 5" # 17:00 UTC every Friday
  workflow_dispatch: {} # enables manual triggers

permissions:
  contents: read
  pull-requests: read

jobs:
  changelog:
    runs-on: ubuntu-latest
    timeout-minutes: 20
    env:
      MODEL_REPO: unsloth/gemma-4-E2B-it-GGUF
      MODEL_FILE: gemma-4-E2B-it-Q4_K_M.gguf
      LLAMA_VERSION: b8808

2. Fetch and cache the engine and the model⁠

Our downloads are pretty big, so to save time and (very little) money, make sure to cache the downloads. Otherwise every Friday we're re-downloading the same 3 GB model for no reason. actions/cache keyed on the version and filename means we download once and reuse it forever.

steps:
  - uses: actions/checkout@v6
  - uses: actions/setup-node@v6
    with: { node-version: 24 }

  - name: Cache llama.cpp
    id: cache-llama
    uses: actions/cache@v5
    with:
      path: ~/llama
      key: llama-${{ env.LLAMA_VERSION }}-linux-x64

  - name: Download llama.cpp
    if: steps.cache-llama.outputs.cache-hit != 'true'
    run: |
      mkdir -p ~/llama
      curl -fSL "https://github.com/ggml-org/llama.cpp/releases/download/${LLAMA_VERSION}/llama-${LLAMA_VERSION}-bin-ubuntu-x64.tar.gz" \
        | tar -xz -C ~/llama

  - name: Cache the model
    id: cache-model
    uses: actions/cache@v5
    with:
      path: ~/models
      key: model-${{ env.MODEL_FILE }}

  - name: Download the model
    if: steps.cache-model.outputs.cache-hit != 'true'
    run: |
      mkdir -p ~/models
      curl -fSL "https://huggingface.co/${MODEL_REPO}/resolve/main/${MODEL_FILE}" \
        -o ~/models/${MODEL_FILE}

We're using curl with these flags to handle our downloads: -f fails on an HTTP error instead of saving the error page, -S shows an error if it does, and -L follows the redirect that both GitHub and Hugging Face send us through.

3. Pull the week's merged PRs⁠

I've opted for a Node script here (changelog.mjs) which asks the API for recently closed PRs, then filters against ones merged to main in the last seven days:

const repo = process.env.GITHUB_REPOSITORY; // owner/name, set by Actions
const since = new Date(Date.now() - 7 * 24 * 60 * 60 * 1000).toISOString(); // 7 days ago

const gh = (path) =>
  fetch(`https://api.github.com${path}`, {
    headers: { Authorization: `Bearer ${process.env.GITHUB_TOKEN}` }
  }).then((r) => r.json());

const closed = await gh(`/repos/${repo}/pulls?state=closed&sort=updated&direction=desc&per_page=100`);
const merged = closed.filter((pr) => pr.merged_at && pr.merged_at >= since && pr.base.ref === "main");

closed covers both merged and abandoned PRs, so the merged_at check is what separates work that shipped from work that didn't.

4. Give it personality (AKA prompt it)⁠

We want this to be lighthearted and a little cheesy, so let's prime the LLM and give it some magic. So alongside the merged PRs and a merge count per author we want to set its tone of voice:

You are a witty engineering newsletter writer with the energy of a sports commentator. Write 2-3 short, fun paragraphs on what the team shipped this week. Mention the top contributors by name. Keep it under 300 words, with no headings or lists. Focus on what was actually built and shipped: do not end on generic motivational fluff or vague praise.

const list = merged.map((pr) => `- "${pr.title}" by ${pr.user.login}`).join("\n");

const instruction = "You are a witty engineering newsletter writer with the energy of a ..."; // the rest of the prompt

Left to its own devices a model signs off every summary with something like "keep up the great work, team!", which grates after a while, so that's why the prompt ends with "...do not end on generic motivational fluff or vague praise."

5. Run the model⁠

We assemble the full prompt in the model's turn format, write it to the file llama.cpp will read, then run the binary and capture its stdout:

import { execFileSync } from "node:child_process";
import { writeFileSync } from "node:fs";

// wrap the instruction and PR list in the turn markers the model card lists
const prompt = `<|turn>user\n${instruction}\n\n${list}<turn|>\n<|turn>model\n`;
writeFileSync("prompt.txt", prompt);

const reply = execFileSync(
  process.env.LLAMA_CLI,
  [
    "--model",
    process.env.MODEL_PATH,
    "--file",
    "prompt.txt",
    "--ctx-size",
    "8192",
    "--n-predict",
    "2048",
    "--temp",
    "0.7",
    "--threads",
    "2",
    "--no-display-prompt",
    "-no-cnv"
  ],
  { encoding: "utf-8", timeout: 300_000 }
);
What each flag does
  • --model points at the GGUF file we downloaded earlier.
  • --file prompt.txt reads the prompt from a file rather than the command line, so the newlines and quotes survive intact.
  • --ctx-size 8192 is the context window: it has to hold the prompt plus the reply.
  • --n-predict 2048 caps the tokens generated, so a model that starts rambling still stops.
  • --temp 0.7 is the sampling temperature: a little randomness for personality, not a lot.
  • --threads 2 is how many CPU threads to use; a stock runner has two to four.
  • --no-display-prompt prints only the reply, not the prompt echoed back in front of it.
  • -no-cnv runs one-shot generation, not an interactive chat session.

CPU inference is slow and this summary can take a few minutes, but we don't care because it's a weekly job that runs in the background. Q4_K_M should fit on a standard runner comfortably, but a bigger quantisation or model would give a better output (if you have the resources). If you're hitting the five-minute timeout window you should swap in a smaller model or a bigger runner.

Each model wraps a conversation in its own special tokens (stuff like <turn|>...<|turn>, <|im_start|>, etc.), so we need to wrap the prompt in the format the model card specifies, and strip the markers out with a quick .replace() before we post it.

An example clean()

Gemma wraps its turns in <turn|> and <|turn>, and llama.cpp can tack on an [end of text] marker and a stats line. A handful of .replace() calls clear the lot, then we tidy the blank lines:

function clean(output) {
  return output
    .replace(/<turn\|>/g, "")
    .replace(/<\|turn>/g, "")
    .replace(/\[end of text\]/g, "")
    .replace(/\[ Prompt:.*?Generation:.*?\]/g, "")
    .trim()
    .replace(/\n{1,}/g, "\n\n");
}

6. Clean it up and post it⁠

After stripping the special tokens we send the result wherever it's going. Slack takes a JSON payload on the webhook URL:

await fetch(process.env.SLACK_WEBHOOK_URL, {
  method: "POST",
  headers: { "Content-Type": "application/json" },
  body: JSON.stringify({
    blocks: [
      { type: "header", text: { type: "plain_text", text: "🗞️ The Weekly Scoop" } },
      { type: "section", text: { type: "mrkdwn", text: clean(reply) } },
      { type: "context", elements: [{ type: "mrkdwn", text: "🤖 _Generated by a local LLM running in GitHub Actions._" }] }
    ]
  })
});

All of this runs as the workflow's final step, where we hand the script the things it can't read from the environment automatically. GITHUB_REPOSITORY and GITHUB_TOKEN exist automatically, but the latter isn't exposed to the environment until we map it in. The webhook and the two file paths are entirely ours to provide:

- name: Generate and post the changelog
  env:
    GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
    SLACK_WEBHOOK_URL: ${{ secrets.SLACK_WEBHOOK_URL }}
  run: |
    export LLAMA_CLI="$(find ~/llama -name 'llama-cli' -type f | head -1)"
    export MODEL_PATH=~/models/${{ env.MODEL_FILE }}
    node changelog.mjs

The token and webhook are secrets, so they go in env:. The two paths are set in the shell instead, where ~ expands to a path llama.cpp can open (a literal ~ left in an env value would not). Add SLACK_WEBHOOK_URL to the repo's secrets, fire it once from the manual trigger, and boom, everyone at work now thinks you're really cool.

Chef's notes⁠

A few considerations about this whole setup. It's a nice overview and something fun, but it's still a small local model summarising the week based on pull requests, so it can get things wrong, and could occasionally oversell a one-line fix into some heroic campaign that saved the company from going bust.

That aside, you can tweak this recipe however you like. It doesn't have to just be a cringe engineering sports broadcaster. Point it at commits or closed issues instead of PRs, draft release notes, triage the backlog, label what just merged. Swap Slack for a committed CHANGELOG.md or a GitHub Discussion, give it a different voice, or feed it a bigger model on a beefier box. The YAML here is GitHub Actions, but the concept ports to any CI runner.