AI News: Groundbreaking New Language Models from OpenAI & Google

Groundbreaking new language models from OpenAI and Google are reshaping AI capabilities. Discover GPT-4 updates, AI coding tools, and the latest advancements in multimodal AI video generation. Stay ahead of the curve on the latest AI news and innovations.

20 de abril de 2025

OpenAI has released a series of impressive new AI models, including GPT-4.1, which offers enhanced capabilities and cost-effective pricing compared to previous versions. Additionally, Google has unveiled Gemini 2.5 Flash, a powerful and versatile language model, while Anthropic and Grock have also introduced new features and updates to their AI assistants. These advancements showcase the rapid progress in the field of large language models and their growing capabilities in areas such as problem-solving, coding, and multimodal reasoning.

GPT-4.1 Models: Smarter, Faster, and More Cost-Effective
Thinking Models with Integrated Search and Tools
Impressive Benchmarks for GPT-3 and GPT-4 Models
Upcoming OpenAI Releases: GPT-3 Pro and Potential Social Network
Microsoft Copilot Studio's Computer Usage Feature
Google's New Gemini 2.5 Flash Model Impresses
Anthropic's Claude Adds Research and Google Workspace Integration
Grock Studio and Memory Features
Clling 2.0 Introduces Multimodal Video Generation
Arcads.ai Offers Gesture-Controlled AI Actors
Luma Dream Machine Adds Camera Angle Adjustments
Other AI Advancements: Crisp's Accent Removal, Netflix's OpenAI-Powered Search, and Google's AR Glasses
Conclusion

GPT-4.1 Models: Smarter, Faster, and More Cost-Effective

OpenAI has rolled out a new set of GPT-4.1 models, including GPT-4.1, GPT-4.1 Mini, and GPT-4.1 Nano. These models are designed to be smarter, faster, and more cost-effective than their predecessors.

The key features of the GPT-4.1 models include:

Improved Performance: The GPT-4.1 models are more capable than the previous GPT-4 and GPT-4.5 models, particularly in areas like coding, math, and reasoning.
Faster Response Times: Unlike the GPT-4 and GPT-4.5 models, the GPT-4.1 models are not "thinking" models, meaning they provide faster responses without the need for extended processing.
Massive Context Window: The GPT-4.1 models have a 1 million token context window, allowing them to draw upon a vast amount of information during their processing.
Significantly Lower Pricing: The GPT-4.1 models are much more affordable to use, costing around $1.84 per million tokens, compared to the $75-$150 per million tokens for the GPT-4.5 model.

These improvements make the GPT-4.1 models a more attractive option for users who need powerful language models but are constrained by cost or response time requirements. The models' ability to integrate images, search the web, and use tools during their reasoning process further enhances their capabilities.

Thinking Models with Integrated Search and Tools

OpenAI has released a new generation of "thinking" language models, including GPT-3.1 and GPT-3.4 Mini, that can integrate web searches and tool usage into their reasoning process. These models are able to:

Analyze images as part of their thinking process, using visual reasoning to supplement textual information.
Perform web searches, review the results, and refine their responses based on the new information.
Utilize various tools, such as code execution, data analysis, and information retrieval, to aid in problem-solving.

The key advantage of these models is their ability to dynamically gather and synthesize information, rather than relying solely on their training data. This allows them to tackle more complex, open-ended problems that require research, analysis, and the application of diverse skills.

For example, when asked a question about energy usage in California, the model can:

Search for relevant data from public utility sources
Write Python code to analyze the data and generate a forecast
Create a visualization to explain the key factors
Provide a detailed, thoughtful answer that integrates the research and analysis.

The models' capacity for multi-step reasoning, tool usage, and information gathering makes them powerful problem-solving assistants. They represent a significant advancement in language model capabilities, blending textual, visual, and computational reasoning to tackle complex, real-world challenges.

Impressive Benchmarks for GPT-3 and GPT-4 Models

The new GPT-4.1 models from OpenAI are quite impressive, with several key capabilities:

Massive context window of 1 million tokens (roughly 750,000 words), allowing it to draw upon a vast amount of information during its reasoning process.
Strong performance on the "needle in a haystack" benchmark, accurately finding small pieces of information buried in large amounts of text.
On par with GPT-4.5 in vision tasks and slightly better at reasoning.
Significantly cheaper to use than the previous GPT-4.5 model, costing only $1.84 per million tokens compared to $75-$150 for GPT-4.5.

OpenAI also released the 03 and 04 Mini models, which are "thinking" models that can integrate images and use tools during their reasoning process. These models have shown impressive results:

03 without tools scored 88.9% on a competition math benchmark.
04 Mini with no tools scored 92.7% on the same math benchmark.
With the ability to use tools like web search and Python code execution during their reasoning, 03 with Python scored 95.2% and 04 Mini with Python scored 99.5% on the math benchmark.

The integration of visual and textual reasoning, as well as the use of tools, makes these models highly capable at problem-solving and task completion. They can search the web, analyze data, write code, and generate detailed, thoughtful answers - all as part of their reasoning process before providing the final output.

These advancements in GPT-3 and GPT-4 models demonstrate the rapid progress being made in large language models and their ability to tackle increasingly complex tasks.

Microsoft Copilot Studio's Computer Usage Feature

Microsoft announced this week that they will be rolling out a new computer usage feature directly inside of Microsoft Copilot Studio. This feature is not yet available, but Microsoft plans to showcase it more at their upcoming Microsoft Build event next month.

From the information provided, this new computer usage functionality will leverage OpenAI's capabilities to allow Microsoft Copilot Studio to take control of the user's computer and perform actions on their behalf. The article mentions that this could enable Copilot to assist users by automating various tasks directly on their system.

While details are limited at this time, Microsoft's announcement suggests this integration of AI-powered computer control could significantly expand the capabilities of their Copilot assistant. Users interested in testing this feature early can fill out a form at the bottom of the announcement article.

Google's New Gemini 2.5 Flash Model Impresses

Google just rolled out a brand new large language model called Gemini 2.5 Flash. Gemini 2.5 has been one of the most popular models recently, performing better than even Anthropic's Claude 2.7 in many areas.

The new Gemini 2.5 Flash model is a bit lighter and faster than the original Gemini 2.5. It's also Google's first fully hybrid reasoning model, allowing developers to turn the thinking capability on or off as needed.

Compared to other models like 04 Mini, Claude Sonnet 3.7, and DeepSeek R1, Gemini 2.5 Flash is quite a bit less expensive. However, turning on the reasoning feature does increase the cost to be more in line with those other models.

In user preference testing on the LM Arena benchmark, Gemini 2.5 Flash outperforms several other popular models while being priced competitively. It seems to excel at science, mathematics, coding, and visual reasoning.

Google has also made Gemini 2.5 Flash available to try for free on their AI Studio platform at ai.dev. This allows developers to test out the model's capabilities, including the ability to toggle the reasoning feature on and off.

Overall, the new Gemini 2.5 Flash model appears to be an impressive and cost-effective option for developers looking to integrate large language model capabilities into their applications.

Anthropic's Claude Adds Research and Google Workspace Integration

Anthropic's AI assistant, Claude, has received a couple of notable updates this week:

Research Feature: Claude now has a new "research" functionality that allows it to connect with various Google services like Gmail, Calendar, and Google Drive. This enables Claude to gather information from these sources to help with tasks like planning a trip. The research feature is currently available in early beta for Anthropic's Max team and enterprise plans.
Google Workspace Integration: Claude now allows users to connect their Google Workspace apps like Drive, Calendar, and Gmail. This integration is available to all paid Claude users, including those on the $20/month plan. Users can now access these services directly within the Claude interface.

Additionally, it's reported that Anthropic is nearing the launch of a new "voice mode" for Claude, which would make it the last major AI assistant to roll out voice capabilities, following similar features from OpenAI's ChatGPT and others. The voice mode is expected to launch in a limited capacity initially, likely first for Anthropic's higher-tier customers.

These updates further enhance Claude's capabilities and integrations, allowing users to leverage the assistant's natural language abilities alongside their existing productivity tools and data sources.

Grock Studio and Memory Features

Grock, the AI assistant from XAI, has released a few new updates this week:

Grock Studio: This new feature adds code execution and Google Drive support to Grock. It looks similar to OpenAI's Canvas, where the chat is pushed to the side and a new window opens on the right. Grock can now generate documents, code, reports, and even browser games directly within this interface.
Memory Feature: Grock has also rolled out a memory feature, similar to the one recently introduced by OpenAI. This allows Grock to remember past conversations and provide personalized responses based on the context. Users can see what Grock knows and choose what they want it to forget.

These new features are currently available on the Grock website, grock.com, but may not yet be integrated into the version of Grock available on X. The memory feature is still in beta.

Overall, these updates bring Grock closer to the capabilities of other leading AI assistants, allowing users to leverage Grock's abilities in a more seamless and integrated way.

Clling 2.0 Introduces Multimodal Video Generation

This week, Clling launched their new 2.0 version of their AI video generation model. In this 2.0 model, Clling AI officially introduces a new interactive concept for AI video generation - multimodal visual language.

This concept enables users to efficiently convey complex multi-dimensional creative ideas such as identity, appearance, style, scenes, actions, expressions, and camera movements directly to AI by integrating multimodal information like image references and video clips.

Some key improvements in Clling 2.0 include:

Better adherence to actions, with more natural range of motion and realistic movement speeds.
Enhanced dynamics and cinematic visual style consistency.
Dramatic expressions and improved camera movements.

Clling has showcased some really impressive video samples generated with this new 2.0 model, including a jet flying sequence, a scene of Native Americans riding horses, and a creative take on a Titanic movie scene.

The model also now supports features like swapping out actors in existing film scenes, as well as the ability to convey specific emotions and gestures through AI-generated avatars.

Overall, Clling 2.0 represents a significant step forward in AI-powered video generation, blending textual prompts with multimodal inputs to create highly realistic and cinematic video content.

Arcads.ai Offers Gesture-Controlled AI Actors

Arcads.ai is a new platform that allows users to generate AI-powered actors with specific gestures and emotions. The platform features the ability to prompt AI actors to generate looks like crying, laughing, celebrating, pointing, and more.

The actors are based on real images of real actors, but the platform uses AI to make them perform various actions and expressions. This allows users to create ads and other content with AI-generated performances.

While the technology looks impressive, the platform's pricing model is a concern. The lowest plan starts at $110 per month for 10 videos, which may be a barrier for some users. The platform also does not offer a free trial, which is a common complaint.

Overall, Arcads.ai represents an interesting development in the world of AI-generated content, particularly in the realm of emotional and gesture-based performances. However, the high pricing and lack of a free trial may limit its accessibility for some users.

Luma Dream Machine Adds Camera Angle Adjustments

Luma Dream Machine, the AI video generation tool, has rolled out a new feature that allows users to adjust camera angles on the videos they generate.

Inside the prompt box, there is now a camera icon that users can click to access a variety of camera angle options. These include static, handheld, zoom in/out, pan left/right, tilt up/down, push in/pull out, and truck left/right.

This new feature provides users with more control over the cinematic look and feel of their AI-generated videos. For example, the assistant was able to generate a first-person view looking out of the cockpit of a fighter jet by using the "point of view" camera angle option.

While the initial test generated a few extra dials on the dashboard compared to a real fighter jet, the ability to specify camera angles is a significant enhancement to Luma Dream Machine's capabilities. With more prompting and testing, users can likely achieve even more polished and realistic camera perspectives in their AI-generated video content.

This update demonstrates Luma Dream Machine's commitment to expanding the creative possibilities for users working with AI-powered video generation. The new camera angle controls open up new avenues for producing dynamic, visually compelling footage.

Other AI Advancements: Crisp's Accent Removal, Netflix's OpenAI-Powered Search, and Google's AR Glasses

Crisp, a tool known for removing background noise and improving audio quality, now has a feature that helps remove accents. This allows call centers to make someone with an Indian accent sound like they have a Texas accent.

Netflix is testing a new AI-powered search engine, which is OpenAI-powered, to better recommend shows and movies to users. It will let users search based on more specific criteria like their mood. The feature is rolling out first in Australia and New Zealand on iOS devices.

Google has shown off new AR glasses that can translate languages in real-time, provide navigation, and display information in a heads-up display. The glasses were demoed at a recent TED talk and the speaker was able to get directions, identify a book, and more using the glasses' capabilities. This form factor is similar to Meta's Ray-Ban smart glasses, and it's rumored that Google wants to release something like this before Meta.

Conclusion

In this week's AI news roundup, we saw several major developments from leading AI companies:

OpenAI announced the retirement of GPT-4 and the introduction of GPT-4.1, a new model with improved capabilities and lower pricing. They also released 03 and 04 Mini, thinking models with the ability to integrate images and use tools during the reasoning process.
Google unveiled Gemini 2.5 Flash, a new large language model with hybrid reasoning capabilities that outperforms many competitors in user preference testing. They also launched Dolphin Gemma, an open model for studying dolphin communication.
Anthropic added research and Google Workspace integration features to their Claude model, and is reportedly nearing the launch of a voice mode.
Cling AI released version 2.0 of their AI video generation model, with significant improvements in adherence to actions, camera movements, and emotional expression.
Emerging tools like Arcads.ai and Google's new AR glasses showcase the rapid advancements in multimodal AI capabilities, blending visual, textual, and even gestural inputs.

Overall, this week's announcements highlight the continued progress in large language models, multimodal AI, and the integration of these technologies into practical applications and user-friendly interfaces. The pace of innovation in the AI field shows no signs of slowing down.

Preguntas más frecuentes

Why is GPT-4 being retired and replaced by GPT-4.0?

What are the new GPT-4.1 models, and how do they differ from GPT-4.5?

How do the performance metrics of the new 03 and 04 mini models compare to other OpenAI models?

What are the key capabilities of the new 03 and 04 mini models with Python integration?

What are the new AI coding tools announced by OpenAI?

What are the key updates to Google's Gemini and Gemma language models?

What new features did Anthropic's Claude model receive?

What updates did Cling AI make to their video generation model?

What new AI-powered features are being added to Microsoft Copilot and Google Workspace?