Google Adds Batch Mode to Gemini API for Cheaper, Scalable AI Jobs

Google has launched a new Batch Mode for its Gemini API, targeting developers with high-volume, non-urgent AI workloads. The new asynchronous endpoint allows users to process large batches of prompts at half the cost of the synchronous API. It also supports 2 GB JSONL files and advanced features like context caching and integrated tools. This update makes Gemini more viable for enterprises needing affordable, scalable AI infrastructure.

July 7, 2025

July 9, 2025

•

Georg S. Kuklick

Google has introduced a Batch Mode for the Gemini API, aimed at developers running large-scale, non-latency-sensitive workloads. The new endpoint supports asynchronous jobs with up to 24-hour turnaround, bundling multiple prompts into a single API call. Users can upload JSONL files up to 2 GB in size, with a cap of 2,000 total requests per job. Batch Mode also supports tool integration like Google Search and context caching to improve efficiency across large datasets.

Pricing for Batch Mode is set at 50 % less than the standard synchronous API, making it a cost-efficient option for teams focused on content generation, model evaluation, or data labeling. Google emphasizes the scalability of this approach, offering higher rate limits and reduced overhead for managing large volumes of calls. This positions Gemini more competitively against alternatives like OpenAI’s batch processing and aligns with broader enterprise use cases.

The update also signals Google’s intent to make its GenAI infrastructure more developer-friendly and operationally practical. By targeting workflows where real-time speed is unnecessary, Batch Mode unlocks more budget-conscious applications of Gemini models. It is currently available in public preview for Gemini 1.0 Pro and 1.5 Pro across Vertex AI and Google AI Studio.

Pure Neo Signal:

Data Source

Share this post:

We love

and you too

If you like what we do, please share it on your social media and feel free to buy us a coffee.

Vienna - Kleiner Schwarzer $2.90 Berlin - Flat White $4.90 NYC - Pour Over $5.90 San Francisco - Cold Brew $6.90 Buy us Coffee

Latest AI News

OpenAI

Sora

OpenAI launches Sora 2 and introduces social video app

OpenAI has released Sora 2, a new version of its AI video generation model, alongside the debut of the Sora app. The app positions OpenAI as both a model developer and a social platform operator. With higher realism, synchronized audio, and a distinct approach to feeds and responsibility, the launch marks a direct entry into competition with TikTok and Instagram.

OpenAI

ChatGPT

OpenAI debuts ChatGPT Pulse for proactive daily updates

OpenAI has introduced ChatGPT Pulse, a new feature that delivers proactive, personalized updates. Initially available in preview for Pro users on mobile, Pulse shifts ChatGPT from reactive answers to daily insights based on memory, chat history, and optional integrations. The rollout positions ChatGPT as a more active assistant in planning and decision-making.

Notion

Notion adds AI Agent in version 3.0 rollout

Notion has released version 3.0, introducing a built-in AI Agent that executes autonomous tasks across the platform and beyond. The Agent can search connected apps, manage Notion workspaces, and run operations for up to 20 minutes. The update positions Notion as a direct competitor to AI-first workplace tools by moving from note-taking toward task execution.

Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form.

AI Lab

For Individuals For Business For Enterprise Pricing

Build with ♥️ in Berlin, New York, and Vienna.