The changelog that writes itself
Weirdly, I like reading changelogs, but a wall of fix: and chore: has all the charm of a bank statement.
I've been exploring the world of local LLMs for a while, and wanted to see if I could run a small model as part of a CI run in a GitHub Action. I like the idea of not relying on a cloud API, or paying for this kind of unsolicited fun, so I set to putting together an action that downloads a language model onto a CI runner, reads the team's merged PRs over the past week, and writes the summary in the style of a sports commentator. Pretty spicy stuff!
Now, every Friday afternoon, something like this lands in the team channel:
🗞️ The Weekly Scoop
19 June – 26 June | 71 PRs merged by 9 contributors
🍦 THIS WEEK'S SCOOPS
What a week on the vans. Mabel Scoops was unstoppable in the checkout flow, landing a hat-trick: she sorted the penny-rounding bug that had been short-changing the till, wired up refunds for returned tubs so the books finally balance, and tidied the receipt formatting on the way past. Over in stock, Sam Sprinkles kept the cold chain honest, shipping the low-freezer alerts that page the depot before a van runs dry, plus a refactor of the route planner that shaved real time off the morning dispatch.
Priya Ripple took the defensive honours, hardening the menu API against the lunchtime rush with proper rate limits and caching, while Niall Ninetynine laid the groundwork for the new loyalty scheme. And a quiet word for BrainFreeze, the night-shift bot keeping the docs in step with every merge without being asked.
🤖 Generated by a local LLM running in GitHub Actions. No data leaves the runner.
This is waaaay more fun to read after a full week of shipping code. Here's how to build it:
Ingredients
We need five things:
- A repo with merged pull requests. GitHub Actions hands every job a built-in
GITHUB_TOKENby default, so listing the week's PRs needs no extra secret. - llama.cpp. The inference engine that runs the model. They publish prebuilt binaries on their releases page, so we can download one instead of needing to compile anything.
- A small GGUF model. I really like
unsloth/gemma-4-E2B-it-GGUFfor things like summaries.Q4_K_Mquantisation is small enough to run on CPU on a stock runner. Any GGUF model that fits on your box will work though. - A Slack incoming webhook (if you want Slack output), stored as a repo secret named
SLACK_WEBHOOK_URL. Point it wherever you like, or swap it for a different output later. - About five minutes of a runner, once a week. Should be free on public repos and literally pennies on private ones.
So the only thing you set up by hand is that one secret, SLACK_WEBHOOK_URL. The GITHUB_TOKEN is automatic, and everything else worth tweaking (the model repo, the model file, and the llama.cpp version) sits in plain env: at the top of the workflow.
Method
We pull the data (i.e. the week's PRs), hand it to a small local model to summarise, then post the result.
1. Schedule it
A workflow that runs on a cron that also offers a manual trigger, so we can test without waiting until Friday.
name: Weekly changelog
on:
schedule:
- cron: "0 17 * * 5" # 17:00 UTC every Friday
workflow_dispatch: {} # enables manual triggers
permissions:
contents: read
pull-requests: read
jobs:
changelog:
runs-on: ubuntu-latest
timeout-minutes: 20
env:
MODEL_REPO: unsloth/gemma-4-E2B-it-GGUF
MODEL_FILE: gemma-4-E2B-it-Q4_K_M.gguf
LLAMA_VERSION: b8808
2. Fetch and cache the engine and the model
Our downloads are pretty big, so to save time and (very little) money, make sure to cache the downloads. Otherwise every Friday we're re-downloading the same 3 GB model for no reason. actions/cache keyed on the version and filename means we download once and reuse it forever.
steps:
- uses: actions/checkout@v6
- uses: actions/setup-node@v6
with: { node-version: 24 }
- name: Cache llama.cpp
id: cache-llama
uses: actions/cache@v5
with:
path: ~/llama
key: llama-${{ env.LLAMA_VERSION }}-linux-x64
- name: Download llama.cpp
if: steps.cache-llama.outputs.cache-hit != 'true'
run: |
mkdir -p ~/llama
curl -fSL "https://github.com/ggml-org/llama.cpp/releases/download/${LLAMA_VERSION}/llama-${LLAMA_VERSION}-bin-ubuntu-x64.tar.gz" \
| tar -xz -C ~/llama
- name: Cache the model
id: cache-model
uses: actions/cache@v5
with:
path: ~/models
key: model-${{ env.MODEL_FILE }}
- name: Download the model
if: steps.cache-model.outputs.cache-hit != 'true'
run: |
mkdir -p ~/models
curl -fSL "https://huggingface.co/${MODEL_REPO}/resolve/main/${MODEL_FILE}" \
-o ~/models/${MODEL_FILE}
We're using curl with these flags to handle our downloads: -f fails on an HTTP error instead of saving the error page, -S shows an error if it does, and -L follows the redirect that both GitHub and Hugging Face send us through.
3. Pull the week's merged PRs
I've opted for a Node script here (changelog.mjs) which asks the API for recently closed PRs, then filters against ones merged to main in the last seven days:
const repo = process.env.GITHUB_REPOSITORY; // owner/name, set by Actions
const since = new Date(Date.now() - 7 * 24 * 60 * 60 * 1000).toISOString(); // 7 days ago
const gh = (path) =>
fetch(`https://api.github.com${path}`, {
headers: { Authorization: `Bearer ${process.env.GITHUB_TOKEN}` }
}).then((r) => r.json());
const closed = await gh(`/repos/${repo}/pulls?state=closed&sort=updated&direction=desc&per_page=100`);
const merged = closed.filter((pr) => pr.merged_at && pr.merged_at >= since && pr.base.ref === "main");
closed covers both merged and abandoned PRs, so the merged_at check is what separates work that shipped from work that didn't.
4. Give it personality (AKA prompt it)
We want this to be lighthearted and a little cheesy, so let's prime the LLM and give it some magic. So alongside the merged PRs and a merge count per author we want to set its tone of voice:
You are a witty engineering newsletter writer with the energy of a sports commentator. Write 2-3 short, fun paragraphs on what the team shipped this week. Mention the top contributors by name. Keep it under 300 words, with no headings or lists. Focus on what was actually built and shipped: do not end on generic motivational fluff or vague praise.
const list = merged.map((pr) => `- "${pr.title}" by ${pr.user.login}`).join("\n");
const instruction = "You are a witty engineering newsletter writer with the energy of a ..."; // the rest of the prompt
Left to its own devices a model signs off every summary with something like "keep up the great work, team!", which grates after a while, so that's why the prompt ends with "...do not end on generic motivational fluff or vague praise."
5. Run the model
We assemble the full prompt in the model's turn format, write it to the file llama.cpp will read, then run the binary and capture its stdout:
import { execFileSync } from "node:child_process";
import { writeFileSync } from "node:fs";
// wrap the instruction and PR list in the turn markers the model card lists
const prompt = `<|turn>user\n${instruction}\n\n${list}<turn|>\n<|turn>model\n`;
writeFileSync("prompt.txt", prompt);
const reply = execFileSync(
process.env.LLAMA_CLI,
[
"--model",
process.env.MODEL_PATH,
"--file",
"prompt.txt",
"--ctx-size",
"8192",
"--n-predict",
"2048",
"--temp",
"0.7",
"--threads",
"2",
"--no-display-prompt",
"-no-cnv"
],
{ encoding: "utf-8", timeout: 300_000 }
);
What each flag does
--modelpoints at the GGUF file we downloaded earlier.--file prompt.txtreads the prompt from a file rather than the command line, so the newlines and quotes survive intact.--ctx-size 8192is the context window: it has to hold the prompt plus the reply.--n-predict 2048caps the tokens generated, so a model that starts rambling still stops.--temp 0.7is the sampling temperature: a little randomness for personality, not a lot.--threads 2is how many CPU threads to use; a stock runner has two to four.--no-display-promptprints only the reply, not the prompt echoed back in front of it.-no-cnvruns one-shot generation, not an interactive chat session.
CPU inference is slow and this summary can take a few minutes, but we don't care because it's a weekly job that runs in the background. Q4_K_M should fit on a standard runner comfortably, but a bigger quantisation or model would give a better output (if you have the resources). If you're hitting the five-minute timeout window you should swap in a smaller model or a bigger runner.
Each model wraps a conversation in its own special tokens (stuff like <turn|>...<|turn>, <|im_start|>, etc.), so we need to wrap the prompt in the format the model card specifies, and strip the markers out with a quick .replace() before we post it.
An example clean()
Gemma wraps its turns in <turn|> and <|turn>, and llama.cpp can tack on an [end of text] marker and a stats line. A handful of .replace() calls clear the lot, then we tidy the blank lines:
function clean(output) {
return output
.replace(/<turn\|>/g, "")
.replace(/<\|turn>/g, "")
.replace(/\[end of text\]/g, "")
.replace(/\[ Prompt:.*?Generation:.*?\]/g, "")
.trim()
.replace(/\n{1,}/g, "\n\n");
}
6. Clean it up and post it
After stripping the special tokens we send the result wherever it's going. Slack takes a JSON payload on the webhook URL:
await fetch(process.env.SLACK_WEBHOOK_URL, {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({
blocks: [
{ type: "header", text: { type: "plain_text", text: "🗞️ The Weekly Scoop" } },
{ type: "section", text: { type: "mrkdwn", text: clean(reply) } },
{ type: "context", elements: [{ type: "mrkdwn", text: "🤖 _Generated by a local LLM running in GitHub Actions._" }] }
]
})
});
All of this runs as the workflow's final step, where we hand the script the things it can't read from the environment automatically. GITHUB_REPOSITORY and GITHUB_TOKEN exist automatically, but the latter isn't exposed to the environment until we map it in. The webhook and the two file paths are entirely ours to provide:
- name: Generate and post the changelog
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
SLACK_WEBHOOK_URL: ${{ secrets.SLACK_WEBHOOK_URL }}
run: |
export LLAMA_CLI="$(find ~/llama -name 'llama-cli' -type f | head -1)"
export MODEL_PATH=~/models/${{ env.MODEL_FILE }}
node changelog.mjs
The token and webhook are secrets, so they go in env:. The two paths are set in the shell instead, where ~ expands to a path llama.cpp can open (a literal ~ left in an env value would not). Add SLACK_WEBHOOK_URL to the repo's secrets, fire it once from the manual trigger, and boom, everyone at work now thinks you're really cool.
Chef's notes
A few considerations about this whole setup. It's a nice overview and something fun, but it's still a small local model summarising the week based on pull requests, so it can get things wrong, and could occasionally oversell a one-line fix into some heroic campaign that saved the company from going bust.
That aside, you can tweak this recipe however you like. It doesn't have to just be a cringe engineering sports broadcaster. Point it at commits or closed issues instead of PRs, draft release notes, triage the backlog, label what just merged. Swap Slack for a committed CHANGELOG.md or a GitHub Discussion, give it a different voice, or feed it a bigger model on a beefier box. The YAML here is GitHub Actions, but the concept ports to any CI runner.