Google does not serve all models equally
Darsan Guruvayurappan
15 Dec 2025
Tl;dr - Gemini API calls with images may give you incorrect responses if you’re using the Google Developer API endpoint. Fix it by switching to the Vertex endpoint.
Early last week, one of our clients filed a ticket complaining that our translation feature was not working properly. It wasn’t that the feature was throwing errors, rather for specific pages in some languages, the translation was wildly incorrect.
Our PM flagged this, and I thought, “hmm… must be an issue with our image resolutions, I’m sure we can solve it by tonight”. Boy, was I in for some pain.
Step 1 - Triage
The ticket we received was simple enough: PDF documents in Marathi weren’t translating correctly. For some context, Lucio has a translation feature that takes in non-English documents and returns a translated version that maintains the formatting of the original document.
Lawyers typically don’t share documents with third-parties, so we couldn’t get our hands on the problematic document. To triage this, we found a bunch of non-English documents online and ran it through our systems.
Not a single error.
We found more documents. And more. And even more. Yet we were consistently receiving high quality outputs on our testing and production environments.
Deployment Woes
Most of our clients are law firms that often have incredibly high data privacy standards. As a result, we offer a bevy of deployment options for clients, based on jurisdiction, data residency requirements, and more. For the most sensitive firms, we also offer the option to deploy Lucio fully on the firm’s VPC; so that their data never leaves their infrastructure.
While this makes for a great sales pitch, it results in fairly large technical surface area for us to keep track of. Managing different cloud providers, LLM availability, rate limits, and more can get tricky very fast.
This client, in particular, was on a self-hosted deployment. Their configs seemed correct, there was nothing out of place there. Our PM tried uploading a document (which had worked on our instance) into their instance.
Complete gibberish came out the other end.
We had a reproduction.
Broken Pipelines?
Mind you, we didn’t receive any obvious error codes. The pipelines were still working, it was just that some pages gave results that looked right, but were completely wrong.
Our first thought, of course, was that our configs were somehow broken. We audited their entire config set top-to-bottom; but nothing was amiss. In fact, it was almost exactly the same as our testing environment.
Our next thought was our pipeline was somehow broken. For the translation feature, when a user uploads a document, we run it through a fairly complex set of transformations to get the final result. At a very high level it looks somewhat like this:
User uploads a document
Normalize and extract text from it
Translate text using conventional translation engines and LLMs
Use vLLMs to parse document layout and structure
Synthesize the content from 3 and 4 for final result
There are several places where things could go wrong here. Extraction errors, low-res image generation due to PDF conversion failing, LLMs acting up, and more. We audited every single line of code, ran our manual tests, yet nothing seemed broken.
The next few days were a blur:
Our stack has multiple ways of calling LLMs. We switch between Azure OpenAI, Gemini SDKs, Portkey, and OpenRouter for calls depending upon the deployment.
Some poorly generated PDF files have incorrect encodings. They visually look correct, but extracting the text results in gibberish. This was our leading thesis for a while.
We switched models, pinned versions, changed prompts, fiddled around with every possible setting when calling LLMs.
Many coffees later, we realized that the pipelines were throwing out garbage results at a particular point where we were relying on Gemini models.
Step 2 - Eureka
Before we move forward, a quick primer on how Google serves their LLMs.
Google offers you 2 main ways of accessing their models: Gemini Developer API and Gemini Vertex. Google’s had a long and annoying history with their LLM SDKs. They started out with Vertex AI, which is enterprise-focused and a complete pain to use. Once their quickstart reached the point where it required 4 days and 2 full-stack developers to complete, they decided to abandon it and start over with the Gemini Developer API.
This is a much simpler API key based service that is the currently recommended method to access LLMs. Their new Python SDK (called google-genai, which is completely different from the now-deprecated google-generativeai) offers a simple way to switch between Vertex and GDAPI endpoints.
Subtleties
At Lucio, we consume Gemini models through a variety of sources:
Our Dev instances use models through OpenRouter; sometimes through their BYOK service to route requests through our own endpoints.
Our production instances use Portkey to route requests through our own GDAPI and Vertex endpoints.
Some deployments where Portkey and Openrouter are both not allowed directly use the
google-genaipackage. Where possible, we use the Vertex endpoint first and fallback to using GDAPI.
Theoretically, every single one of these calls should give the same result, since at the end of the day, they are all routed into Google servers.
We were, however, at a point where we were randomly throwing stuff on the wall to see what stuck. When messing around with our Openrouter configs, we discovered something interesting: disabling our Vertex API BYOK, the response quality degraded.
That made me want to test every single model call separately. Here’s what we saw:
OpenRouter call via
google/gemini-2.5-flash— workedGemini call (through vertex) — worked
OpenRouter via
google-vertex/global— workedOpenRouter via
google-vertex— workedOpenRouter via
google-ai-studio— workedGemini OpenAI-compatible endpoint on Vertex — worked
OpenRouter with BYOK Vertex endpoint disabled — garbage?!
OpenRouter with BYOK AI Studio endpoint disabled — worked
Gemini Call (through GDAPI) - garbage?!
The tests were clear: for some reason, LLM calls with images through our GDAPI API key was giving poor quality results. OpenRouter through google-ai-studio was fine? Switching to Vertex somehow made it work?
Madness.
Step 3 - Reproduction
The error seemed to be fairly consistent. GDAPI would consistently give irrelevant results, while both Vertex and OpenRouter consistently gave the correct response.
You can run these snippets to verify the error yourself. Fetch this random test image to get started. I ran these on 15th December 2025 to get these results.
GDAPI Bad Result
Output (completely irrelevant)
One line change to use Vertex
Output (correct)
Calls through OpenRouter
Output (same as vertex call)
Conclusions
Shifting to the Vertex endpoint and removing the GDAPI endpoint as a translation fallback model fixed the issue. 1 week reduced to a 1-line fix.
It is clear that somehow, though the model identifiers are the same, they serve different versions of the model depending on how you access it. What is confusing, however, is why they would provide a degraded version of their model when accessing it directly, but provide a better version through OpenRouter.
Is it a quantization thing? Or possibly some kind of a botched update? Maybe our API key is cursed. LLMs are already difficult to work with and scale; these kinds of issues add an extra level of spiciness to the whole affair!
Darsan is a co-founder of Lucio, and is passionate about all things Law, AI, and Tech. If you enjoy solving fun problems like this at the cutting edge of AI, come work with us at Lucio!
Also Read


