AI News 2025-12-13
AI Daily Brief
Summary
Zhipu AI open-sourced AutoGLM, giving AI systems device-agent capabilities to operate smartphones.
Runway introduced GWM-1, a general world model designed to simulate world dynamics and enable real-time interaction.
Terence Tao’s team used multiple AI tools to solve a 50-year-old math problem (Erdős #1026) in 48 hours.
Google released Gemini 2.5 Flash Native Audio to improve low-latency voice interaction and real-time translation.
Google is also applying Gemini to Google Translate to boost naturalness and context understanding.
Mistral AI launched Devstral 2, a new open-source coding model family whose performance approaches top closed-source models.
OPPO will ship “AI Miaoting” to turn articles into two-host podcasts.
Skywork released a mobile app bringing multi-agent parallel collaboration to the phone.
Ant open-sourced LLaDA2.0, a 100B-parameter discrete diffusion LLM with faster inference.
Today’s AI News
Zhipu AI open-sources AutoGLM, a system that lets AI operate smartphones: Zhipu AI has officially open-sourced the full codebase and models for its flagship project AutoGLM. AutoGLM is designed to let AI autonomously use smartphones—executing concrete in-app actions like a human rather than merely answering questions. The system provides device-agent capabilities, enabling it to understand interfaces, take actions, perceive feedback, and keep learning. Zhipu says the open-source release is meant to accelerate industry co-building, return data and privacy control to users, and catalyze a broader agent ecosystem.
Runway releases GWM-1, its first “general world model”: Runway introduced GWM-1, positioning it beyond image/video generation: the model aims to understand and simulate world dynamics—time, space, physics, actions, and causality—and support interaction, control, and generalization in real-time environments. The release includes three subsystems: GWM-Worlds, GWM-Avatars, and GWM-Robotics.
Terence Tao’s group solves a 50-year-old math problem in 48 hours with AI tools: Mathematician Terence Tao and collaborators reportedly solved the long-standing math problem Erdős #1026 within 48 hours by combining multiple AI tools, including Harmonic’s math AI model “Aristotle”, AlphaEvolve, and ChatGPT Pro. The effort highlights a growing “human + human + AI” collaboration pattern for tackling complex math problems.
Google releases Gemini 2.5 Flash Native Audio for real-time voice interaction: Google released Gemini 2.5 Flash Native Audio, optimized for low-latency voice interaction. It can directly generate natural speech output for real-time conversations with more natural prosody. Key capabilities include smarter function calling, stronger instruction understanding, and a first-time feature: continuous speech-to-speech real-time translation.
Google applies Gemini to Google Translate and launches voice-to-voice translation beta: Google announced it is integrating Gemini into Google Translate to improve translation naturalness, accuracy, and context understanding. For voice translation, Google introduced a Gemini-powered voice-to-voice translation beta: users can hear translations in real time through headphones while retaining tone and emotion.
Mistral AI launches Devstral 2 open-source coding models (123B and 24B): European AI company Mistral AI released the Devstral 2 open-source coding family, including a 123B flagship model and a 24B lightweight variant. The flagship achieved 72.2 on SWE-bench Verified, with performance approaching top closed-source coding models.
OPPO ColorOS to ship “AI Miaoting” that turns articles into two-host podcasts: OPPO ColorOS will launch a new feature called “AI Miaoting” in December. It lets users convert articles into a two-host podcast format with background music, aiming to make content consumption more vivid and engaging.
Skywork launches mobile app v5.0 with multi-agent parallel collaboration: Chinese foundation model Skywork released its mobile app v5.0, bringing multi-agent parallel collaboration to phones. With a single voice command, the system can spin up multiple specialized agents to generate content in parallel—key-point summaries, to-do lists, mind maps, PPT slides, and more.
Ant open-sources LLaDA2.0: a 100B-parameter discrete diffusion language model: Ant’s research institute open-sourced the LLaDA2.0 series, described as the first discrete diffusion LLM at the 100B-parameter scale. With a new training strategy, it reportedly maintains high generation quality while delivering 2.1× faster inference than comparable autoregressive models.
Medeo AI updates its video generation tool with an agent architecture: AI video tool Medeo AI released a new version built around an agent architecture, enabling real-time, unlimited edits through natural-language commands—such as adding/removing shots or adjusting scripts—significantly increasing creative flexibility.
Broadcom CEO: Anthropic placed $21B in Google TPU orders: During an earnings call, Broadcom’s CEO said the company received $21B in orders from AI company Anthropic for Google’s TPUs, expected to be delivered by the end of 2026. Anthropic reportedly plans to massively expand its AI infrastructure, targeting deployment of 1 million TPUs by 2026.
Comments