Here’s What We Learned Spending 9.5 Billion OpenAI Tokens in January

This past month, we ran through a staggering 9.5 billion OpenAI tokens across our projects at babylovegrowth.ai and samwell.ai. And let me tell you—this was one heck of a learning experience. We managed to slash our costs by 40%, and I’m here to share exactly how we did it!

1. Pick the Right Model — It’s a Game-Changer

At first, we were using GPT-4 for literally everything (yeah… big mistake). Turns out, most of our use cases didn’t need that level of power. So, we switched over to GPT-4o mini, which costs just $0.15 per million input tokens and $0.60 per million output tokens. Performance-wise? Nearly identical for our needs. Cost savings? Massive.

2. Prompt Caching — A Hidden Goldmine

We stumbled upon this one and it was a total game-changer. OpenAI automatically caches identical prompts, meaning repeated requests get processed much faster and cheaper. We saw up to 80% lower latency and cut costs on long prompts by about 50%. The trick? Place the dynamic parts of your prompt at the end so the system can recognize and cache the static parts.

3. Set Up Billing Alerts — Seriously!

We learned this lesson the hard way. Our budget for the month? Gone in just 17 days. Avoid the shock—enable billing alerts and keep a close eye on your usage.

4. Minimize Output Tokens — They're 4x the Cost!

Output tokens are way more expensive than input tokens, so we restructured our responses. Instead of having the model generate long-form text, we had it return position numbers and categories, then mapped those in our code. This simple tweak cut our output token usage by about 70%—plus, it sped things up significantly.

5. Batch Your Requests — Efficiency is Key

Initially, we were making separate API calls for every little task. Big mistake. Now, we batch related tasks into a single request. Instead of doing:

Request 1: "Analyze the sentiment"
Request 2: "Extract keywords"
Request 3: "Categorize"

We now do:

Request 1:
1. Analyze sentiment
2. Extract keywords
3. Categorize"

It’s cleaner, faster, and way more cost-efficient.

6. Batch API for Non-Urgent Tasks — A No-Brainer

OpenAI’s Batch API was a lifesaver. We moved all our overnight processing there and slashed costs by 50%. The 24-hour turnaround time is totally worth it for non-time-sensitive workloads.

Bonus Tips from Someone Spending 2 Billion Tokens per Month

Trim Your Prompts — Most prompts can be shorter than you think. If your input is heavily prompt-weighted, cutting 20% of unnecessary text can save big bucks.
Ditch JSON When Possible — For simpler outputs, plain text is often more efficient and gets better results. The model works better when it’s not overly constrained.
Retry Before Upscaling — Instead of jumping to a bigger model, let a smaller one take a stab at it first. If the response isn’t good enough, retry before falling back on a more expensive model. The cheaper models do the job about 80% of the time!
Translate Prompts for Better Foreign Language Outputs — Instead of just telling the model to respond in another language, translate your prompt first and cache it. You’ll get significantly better results.

Source: Reddit

Liked the story?
Follow us on X for more content!