bfx-ingest
Turn your files into clean AI context · right here in your browser
Feeding files to an AI is messy: you paste them by hand, blow the context window, and get a different blob every time. Drop your files below and get one clean, token-counted context file for any model, split into parts when it is too big. Nothing is uploaded. It all runs in your browser.

Drop your files in

Files or a whole folder. Pick a format and a size for your AI, then download the context. Junk like node_modules, .git, images, and binaries is skipped automatically.

Drop files or a folder here
or
Text and code only. Your files never leave this page.

Same files in gives the same output and the same content fingerprint every time, with no timestamps or randomness, so it is reproducible. The token figure is the standard chars/4 estimate. The size budget covers the context file; leave your model room for your prompt and its reply.

Free for early adopters. This tool is free while it is finding its feet. Early users keep free access; if it gains traction, later versions may add paid tiers.

Why it helps

Done by hand, assembling context is slow, incomplete, and non-reproducible: you forget files, send the same vendored code twice, overflow the window, and can never recreate yesterday's exact input. That breaks evals, wastes tokens, and misses the prompt cache.

bfx-ingest reads your files, fingerprints each one, collapses identical duplicates, and emits one artifact in the shape your model wants, with a token count and a reproducible root fingerprint. Same files in, same bytes out.

Measured on a real 1,593-file repo

Numbers from actually running the command-line version, not a pitch. Counts are exact; token figures are labelled estimates.

1,593
files (18.2M characters) folded into one artifact, no hand-pasting
~4.55M
tokens measured up front, so you scope before wasting a call
same hash
identical input gives the identical root fingerprint, verified by re-running
~85%
of repeated-input cost turned into prompt-cache hits per session (illustrative)

Token figures use the standard chars/4 heuristic; file, byte, and duplicate counts are exact. Every figure is reproducible by re-running the included benchmark.

Prefer the command line?

The same tool runs as one command, zero dependencies, for scripting and CI.

# whole repo into Markdown context npx bfx-ingest ./my-project --format md --out context.md # Claude prefers XML npx bfx-ingest ./src --format xml --out context.xml # JSON for your own RAG / eval pipeline npx bfx-ingest ./src --format json --out ctx.json

Options: --format md|xml|json, --out FILE, --max-kb N. Skips .git, node_modules, build output, binaries, and lockfiles.

Why you can trust it

No black box

The browser tool and the CLI are small, readable, MIT-licensed plain code. Read it before you run it. View source is right there.

Nothing uploaded

The browser version reads your files locally and never sends them anywhere. The CLI is pure Node, no network.

Verify, don't trust

Same files in gives the same fingerprint out. Re-run it and check, instead of taking my word.

Reproducible by design

The fingerprint and manifest prove exactly what the model saw, for evals, caching, and provenance.

About this project

bfx-ingest is free and open source, a personal portfolio piece built to demonstrate the reliable, reproducible, cost-efficient AI systems I build. It is not a commercial product or a service for sale. I am seeking a full-time AI engineering or backend / platform role.