Here’s What We Learned Spending 9.5 Billion OpenAI Tokens in January
This past month, we ran through a staggering 9.5 billion OpenAI tokens across our projects at babylovegrowth.ai and samwell.ai. And let me tell you—this was one heck of a learning experience. We managed to slash our costs by 40%, and I’m here to share exactly how we did it!
1. Pick the Right Model — It’s a Game-Changer
At first, we were using GPT-4 for literally everything (yeah… big mistake). Turns out, most of our use cases didn’t need that level of power. So, we switched over to GPT-4o mini, which costs just $0.15 per million input tokens and $0.60 per million output tokens. Performance-wise? Nearly identical for our needs. Cost savings? Massive.
2. Prompt Caching — A Hidden Goldmine
We stumbled upon this one and it was a total game-changer. OpenAI automatically caches identical prompts, meaning repeated requests get processed much faster and cheaper. We saw up to 80% lower latency and cut costs on long prompts by about 50%. The trick? Place the dynamic parts of your prompt at the end so the system can recognize and cache the static parts.
3. Set Up Billing Alerts — Seriously!
We learned this lesson the hard way. Our budget for the month? Gone in just 17 days. Avoid the shock—enable billing alerts and keep a close eye on your usage.
4. Minimize Output Tokens — They're 4x the Cost!
Output tokens are way more expensive than input tokens, so we restructured our responses. Instead of having the model generate long-form text, we had it return position numbers and categories, then mapped those in our code. This simple tweak cut our output token usage by about 70%—plus, it sped things up significantly.
5. Batch Your Requests — Efficiency is Key
Initially, we were making separate API calls for every little task. Big mistake. Now, we batch related tasks into a single request. Instead of doing:
Request 1: "Analyze the sentiment"
Request 2: "Extract keywords"
Request 3: "Categorize"
We now do:
Request 1:
1. Analyze sentiment
2. Extract keywords
3. Categorize"
It’s cleaner, faster, and way more cost-efficient.
6. Batch API for Non-Urgent Tasks — A No-Brainer
OpenAI’s Batch API was a lifesaver. We moved all our overnight processing there and slashed costs by 50%. The 24-hour turnaround time is totally worth it for non-time-sensitive workloads.
Bonus Tips from Someone Spending 2 Billion Tokens per Month
-
Trim Your Prompts — Most prompts can be shorter than you think. If your input is heavily prompt-weighted, cutting 20% of unnecessary text can save big bucks.
-
Ditch JSON When Possible — For simpler outputs, plain text is often more efficient and gets better results. The model works better when it’s not overly constrained.
-
Retry Before Upscaling — Instead of jumping to a bigger model, let a smaller one take a stab at it first. If the response isn’t good enough, retry before falling back on a more expensive model. The cheaper models do the job about 80% of the time!
-
Translate Prompts for Better Foreign Language Outputs — Instead of just telling the model to respond in another language, translate your prompt first and cache it. You’ll get significantly better results.
Source: Reddit
Follow us on X for more content!