Llama 4 Controversy: Meta's Largest Open-Source AI Model Faces Backlash

Llama 4, Meta's largest open-source AI model, faces backlash over claims of manipulated benchmark results. Explore the controversy surrounding this groundbreaking release and its potential impact on the AI landscape.

13 aprile 2025

party-gif

Discover the latest advancements in AI with a deep dive into the release of Llama 4, Microsoft's AI Copilot updates, and the exciting developments from Google, OpenAI, and other industry leaders. Stay ahead of the curve and learn how these cutting-edge technologies can empower your creativity and productivity.

The Huge Context and Controversy Surrounding Llama 4

Meta's release of Llama 4 was a major event in the AI world this week. The two available models, Llama 4 Scout and Llama 4 Maverick, boast impressive capabilities:

  • Llama 4 Scout has a massive 10 million token context window, allowing it to draw upon the equivalent of 94 novels' worth of information.
  • Llama 4 Maverick has a still-huge 1 million token context window.
  • These context windows far exceed previous models like Google's Gemini 2.5 with a 2 million token context.

However, the release has been mired in controversy. An alleged whistleblower from Meta's AI team claimed the company "mixed various benchmark test sets into the post-training process" to make the models appear stronger than they truly are in real-world performance.

Meta has refuted these claims, stating the variable quality seen is due to implementation issues by those testing the models. But the LM Arena leaderboard seems to support the whistleblower's account, with Llama 4 models dropping significantly in rankings after initial strong showings.

Further complicating matters, Meta apparently used a customized version of Llama 4 Maverick on LM Arena that was optimized for human preference, rather than the open-sourced version released to the public. This lack of transparency has raised concerns in the AI community.

The Llama 4 saga highlights the ongoing challenges of model benchmarking and the potential for misleading claims around AI capabilities. As this story continues to unfold, it will be important to closely scrutinize the real-world performance of these large language models.

Microsoft's AI Copilot Improvements and Quake AI Demo

One of the biggest new features Microsoft added to their AI Copilot was the ability to remember past conversations. With your permission, Copilot will now remember what you've discussed, learning your likes, dislikes, and details about your life, work, and interests. This allows Copilot to provide more personalized and contextual assistance over time.

Microsoft also showcased an AI-generated version of the classic game Quake. By using their Muse AI model, every single frame of the game is generated on the fly as the player moves around. While the result isn't perfect, with some characters disappearing and other visual glitches, it demonstrates the impressive capabilities of AI in generating dynamic game environments.

This AI-powered Quake demo received some backlash from game developers, who criticized the approach. However, id Software co-founder John Carmack defended the technology, stating that it represents a significant leap in game development, enabling smaller teams to accomplish more and bringing in new creator demographics.

Overall, these advancements in Microsoft's Copilot and the Quake AI demo showcase the rapid progress of AI in enhancing productivity tools and generating interactive experiences. As these technologies continue to evolve, they will likely have a transformative impact on various industries, including gaming and software development.

Google's AI Expansions and Announcements

This week saw a number of announcements and updates from Google regarding their AI capabilities and offerings:

  • Google expanded their AI mode in search, improving its ability to handle comparisons, how-to queries, and longer, more open-ended questions. The AI mode can now also search and see images.

  • At the Google Cloud Next 25 event, the company announced a new TPU (Tensor Processing Unit) coming later this year, as well as a new "A2A" (Agent-to-Agent) protocol that allows AI agents to communicate and work autonomously.

  • Google introduced new AI-powered features across their Workspace products, including an audio feature in Google Docs similar to Notebook LLMs, a "help me refine" feature in Docs, new AI enhancements in Google Sheets, and AI-powered summarization and Q&A capabilities in Google Meet.

  • Google also rolled out updates to their Vertex AI platform, including new editing and camera control features in V2, the availability of Chirp 3 (an audio generation model), and Imagine 3 (their text-to-image model).

  • Additionally, a new text-to-music model called LIIA was made available in Vertex AI.

In summary, Google continues to expand the AI capabilities across its suite of products and platforms, integrating language models, text-to-image, text-to-audio, and other AI-powered features to enhance user experiences and productivity.

OpenAI's Upcoming Model Releases and Memory Feature

Apparently, OpenAI is going to release GPT-3.5 and GPT-4 mini models after all, despite earlier plans to skip ahead to GPT-5. According to Sam Altman, GPT-5 is taking longer than expected, so OpenAI will release the 3.5 and 4 mini versions in the meantime.

OpenAI also announced a new memory feature for ChatGPT, which allows the chatbot to tailor its responses based on the contents of previous conversations. When logging into ChatGPT, users can now see a "Introducing new improved memory" box, which generates a personalized description of the user based on their chat history.

This new memory feature is aimed at making ChatGPT's responses more contextual and tailored to individual users, rather than providing generic responses. It demonstrates OpenAI's efforts to continuously improve the capabilities and user experience of their flagship language model.

Anthropic's New Max Plan and Upcoming Claude 4

Anthropic has rolled out a new Max plan, which is a more expensive version of their Claude AI assistant, similar to the higher-end versions of ChatGPT. The Max plan costs $100 per month and provides 5 times more usage than the Pro plan, as well as "maximum flexibility" which gives 20 times more usage than the Pro plan at $200 per month.

Additionally, Anthropic's Chief Scientist Jared Kaplan has stated that Claude 4 is expected to arrive within the next 6 months, so we can likely expect to see a new, improved version of the Claude AI sometime this year.

YouTube's Free AI Music Tool and DaVinci Resolve 20's AI Features

YouTube rolled out a free AI music making tool for creators. This means that video editors no longer have to go to paid subscription services to get background music - they can now use AI within YouTube.

Additionally, DaVinci Resolve 20 has rolled out a ton of new AI features that are exciting for video editors:

  • You can upload a script and the software will try to line up the shots based on the script. This is helpful if you have a bunch of video clips with dialogue shot in different places.
  • It has better magic mask capabilities.
  • You can train it on your voice and do AI voiceovers, similar to tools like 11 Labs, but directly within DaVinci Resolve.

Overall, these new AI tools in YouTube and DaVinci Resolve 20 provide video editors with more efficient ways to speed up their workflows and incorporate AI into their video production process.

Runway's Gen 4 Turbo, Amazon's Nova Reel, and the Latest in AI Video Generation

This week, Runway introduced a new model called Gen 4 Turbo, which is a much faster way to make AI-generated videos. It can generate a 10-second video in just 30 seconds.

Amazon also has an AI video generator called Nova Reel, and it can generate AI videos up to 2 minutes long. In their blog post about it, we can see some examples of the videos it generated, like a raccoon in the water and some fish swimming around. These videos are pretty impressive and seem to be on par with a lot of the other video models we're seeing now.

Overall, we're kind of getting to the point where all the video models are getting pretty good, and it's really hard to say any one is so much better than the others. V2 is still kind of the leader, but all of them seem to be catching up with each other.

Coding AI Innovations: Deep Coder 14B, Gemini API Updates, and Gro 3 API

This week saw some exciting developments in the world of AI-powered coding tools.

Together AI announced the release of Deep Coder 14B, an open-source AI coding reasoning model on the level of GPT-3 or GPT-4 mini. They claim to be releasing the entire dataset, code, and training recipe, making it a fully open-source solution for developers.

Additionally, several new APIs were made available:

  • Gemini 2.5 Flash and Pro Live API
  • Gemini V2 API
  • Gro 3 API

The Gemini API updates allow developers to build platforms that leverage the powerful V2 video generation model. Meanwhile, the Gro 3 API release is particularly exciting, as Gro 3 has been highly impressive but previously lacked API access, limiting its integration into various tools.

Now, with Gro 3's API availability, developers can more easily incorporate its capabilities into their own applications and workflows, further expanding the possibilities for AI-powered coding and content creation.

These new tools and API integrations represent significant advancements in the field of AI-assisted development, empowering coders and creators to streamline their processes and unlock new levels of productivity and creativity.

GitHub Copilot's Agent Mode and MCP Support

GitHub Copilot, the AI-powered coding assistant, has introduced two significant updates this week:

  1. Agent Mode: Copilot now has an "agent mode" where you can tell it what you want it to do, and it will continuously code to achieve that goal. This feature is similar to what tools like Windurf and Cursor offer, allowing the language model to work autonomously towards a specified objective.

  2. MCP Support: Copilot has rolled out support for the MCP (Model Coordination Protocol) standard. This allows Copilot to connect more easily to other APIs, acting as a middleware between the large language model and various tools. This integration makes it simpler for Copilot to work with external services and APIs.

These updates enhance Copilot's capabilities, making it more versatile and seamlessly integrated with the broader AI ecosystem. Developers can now leverage Copilot's AI-powered coding assistance in more sophisticated and automated workflows, further boosting their productivity and efficiency.

WordPress's New AI Website Builder and Shopify's AI Hiring Policy

WordPress just announced a new AI website builder that is now live and free to use. This is an interesting development as it allows WordPress users to leverage AI technology to build websites more efficiently.

Additionally, the Shopify CEO put out a statement to employees stating that they will not allow anyone in Shopify to hire new people unless they can prove that AI can't do the job first. This is a policy that the CEO believes many companies will start adopting, as AI continues to advance and become more capable of performing various tasks.

The idea behind Shopify's policy is to ensure that the company is leveraging AI technology to the fullest extent possible before hiring new employees. This could lead to increased efficiency and cost savings for the company, as well as potentially opening up new opportunities for AI-powered solutions within the Shopify ecosystem.

Overall, these developments highlight the growing influence of AI technology in the world of web development and business operations. As AI continues to evolve, it will be interesting to see how other companies and platforms respond and adapt to these changes.

Exciting Robotics and Gadget Releases: Amazon Zoox, Samsung Ballie, and Kawasaki Corio

Amazon's Zoox begins rolling out their robo taxis in Los Angeles. This is sort of like Amazon's version of Waymo.

Samsung is finally releasing their Ballie, which is this little ball that rolls around and projects stuff on the floor and has AI. They actually showed it off at CES 2024, so over a year ago, and it looks like it's now finally coming out.

Finally, Kawasaki showed off the Corio, which is this robot dog that's designed to be ridden on like a quad or a motorcycle. They even showed off some CGI footage of what it could look like when somebody's riding around on this thing. Some people were sharing this as real footage, but no, that's definitely CGI. Nonetheless, it looks really cool. Based on what I've seen, it looks kind of fun. I wouldn't mind trying one out.

FAQ