Composer 2.5 is now available inside Grok Build.
Composer 2.5 is a fast, highly intelligent model that excels on long-running tasks and following complex instructions.
Introducing Qwen3.7-Plus — a multimodal agent model that unifies vision and language into one versatile agent foundation.
Multimodal interactive hybrid agent: unified GUI & CLI operation across visual and text tasks
Versatile coding agent & productivity assistant with full-modality input
Visual Agent: perception, reasoning, grounding, and search-augmented QA
Cross-harness generalization across diverse agent frameworks
One model. Sees, thinks, codes, acts.
Now available via API on Alibaba Cloud Model Studio. Try it — let us know what you build.
Blog:
https://
qwen.ai/blog?id=qwen3.
7-plus
…
Qwen Studio:
https://
chat.qwen.ai/?models=qwen3.
7-plus
…
API:
https://
modelstudio.console.alibabacloud.com/ap-southeast-1
?tab=doc#/doc/?type=model&url=2840914_2&modelId=qwen3.7-plus&serviceSite=international
…
Kaggle said its Community Benchmark SDK now supports automatic tracking for token usage, cost, and latency alongside standard evaluation results. The update pushes benchmark workflows closer to real product decisions, where teams need to understand not just which model performs best, but which one is cheapest and fastest to run. It is a useful signal that practical model economics are becoming part of mainstream benchmark tooling rather than an afterthought.
Google has completed its acquisition of Wiz, the cloud security company, with Sundar Pichai welcoming the Wiz team publicly. The deal gives Google a broad cloud security platform that protects workloads across providers, strengthening its pitch to enterprise customers who run multicloud environments. Pichai framed the acquisition as giving customers a comprehensive platform to secure their cloud and AI workloads. Wiz co-founder Assaf Rappaport previously turned down a $23 billion offer from Google, making this closing a notable reversal and one of the largest cybersecurity acquisitions in recent memory.
Rakuten uses Codex, the coding agent from OpenAI, to ship software faster and safer, reducing MTTR 50%, automating CI/CD reviews, and delivering full-stack builds in weeks.
Feng Qingyang had always hoped to launch his own company, but he never thought this would be how—or that the day would come this fast. Feng, a 27-year-old software engineer based in Beijing, started tinkering with OpenClaw, a popular new open-source AI tool that can take over a device and autonomously complete tasks for a…
Wayfair uses OpenAI models to improve ecommerce support and product catalog accuracy, automating ticket triage and enhancing millions of product attributes at scale.
How OpenAI built an agent runtime using the Responses API, shell tool, and hosted containers to run secure, scalable agents with files, tools, and state.
Abacus.AI CEO Bindu Reddy said her team is racing to switch coding workloads to GPT-5.4 because it performs better on fairly complex codebases and harder problems. In parallel, Augment Code said GPT-5.4 had become the default model in its agent development environment and framed it as especially strong for agent coordination. Taken together, the posts suggest GPT-5.4’s momentum is no longer just about benchmarks: it is turning into real adoption pressure inside coding and agent-engineering workflows.
A widely shared post from trq212 said Claude Code now supports `/btw`, a command for starting side-chain conversations while Claude continues working on the main task. Jason Liu reposted the feature to his audience, helping turn it into one of the most visible coding-agent workflow updates in this scrape batch. The change points toward a more interruptible, multitasking model for agent-assisted development rather than a single linear prompt-and-wait loop.
AgentMail announced a $6 million seed round led by General Catalyst, and Yohei Nakajima amplified the raise by arguing that email is becoming a core layer for AI agents, not just a communications channel. In his framing, email gives agents identity, authentication, notifications, and access to account creation flows such as self-service API signup and key retrieval. The combined posts cast AgentMail as infrastructure for practical autonomous workflows rather than another narrow inbox tool.
Two posts from the last 48 hours point to GPT-5.4 gaining real traction in coding-agent workflows. Augment Code said GPT-5.4 is now the default model in Intent and highlighted it as especially strong for agent coordination, while Abacus.AI CEO Bindu Reddy said her team is racing to move coding workloads onto GPT-5.4 because it performs better on fairly complex codebases and hard problems. The takeaway is that GPT-5.4 may be moving beyond benchmark headlines into day-to-day developer tooling decisions.
Andreessen Horowitz says new per-capita usage analysis across the 10 largest LLM products shows the United States ranking only 20th in AI adoption despite building many of the category’s biggest products. The firm framed the finding as evidence that consumer AI is becoming a more global market than top-line traffic charts imply. If that view is right, AI distribution and user behavior may become at least as strategically important as model leadership in the next phase of consumer competition.
AgentMail says it has raised a $6 million seed round led by General Catalyst, with investors including Y Combinator, Paul Graham, Dharmesh Shah, and Matt Shumer. The pitch is simple but strategic: every AI agent needs its own inbox. If that framing sticks, email could become one of the core infrastructure layers for autonomous software, giving agents a native way to receive updates, authenticate into workflows, and interact with the outside world without piggybacking on a human account.
Techmeme highlighted Axios reporting that Nielsen’s Gracenote has sued OpenAI for copyright infringement, alleging that OpenAI copied Gracenote’s data and the relational framework it uses to connect metadata. The case is notable because it extends the AI copyright fight beyond training on expressive works into the structured data layers that help classify and link content. If courts take these claims seriously, the legal risk for AI companies could widen from scraped media itself to the metadata systems that make media usable at scale.
A post flagged by The Rundown AI says OpenAI has added interactive visuals for more than 70 math and science concepts inside ChatGPT, including variable sliders, live graphs, and animated demonstrations. The update matters because visual explanations can make ChatGPT more useful for education and self-study than text responses alone, especially for subjects where intuition comes from seeing systems change in real time. It is another sign that OpenAI is expanding ChatGPT from a chatbot into a broader interactive product surface for learning and problem-solving.
Y Combinator CEO Garry Tan said builders should not sleep on Groq paired with Llama 4 Maverick, describing the combination as very useful for low-latency tasks. The post is notable because real-time responsiveness remains one of the hardest constraints in production AI systems, especially for assistants and agent workflows where delays directly affect usability. Tan’s endorsement suggests the conversation is shifting from pure benchmark leadership toward which model-and-inference stacks actually feel fast enough to use continuously.
LangChain announced `langgraph deploy`, a CLI flow that deploys an agent to LangSmith Deployment with a single command. The release is notable because it targets a familiar pain point in the agent stack: moving from experiments into something teams can run and monitor in production without stitching together custom deployment steps. In effect, LangChain is packaging deployment as a first-class part of the LangGraph workflow rather than an afterthought for platform engineers.
Google AI Developers said Gemini Embedding 2 is now available in preview through the Gemini API and Vertex AI, describing it as the company’s most capable and first fully multimodal embedding model built on the Gemini architecture. Jeff Dean separately amplified the launch, saying the model brings text, images, video, audio, and documents into the same embedding space. The update matters because embeddings sit underneath search, retrieval, and recommendation systems, and a stronger multimodal option gives developers a more practical foundation for building cross-format AI products.
A Hugging Face post shared by Georgi Gerganov introduced Storage Buckets, a new S3-like object storage option on the Hugging Face Hub and the first new repository type the platform has added in four years. Unlike the Hub’s standard versioned repos, Storage Buckets are mutable and non-versioned, with pricing positioned below Amazon S3. The release is significant because it shows Hugging Face expanding from model distribution into underlying storage infrastructure for AI teams building production systems.
Dimillian said he is joining OpenAI at the end of the month and will work on Codex as part of the developer experience team. He said he plans to bring what he learned from building Codex Monitor into the role, signaling that OpenAI is continuing to invest not just in coding models themselves, but in the tooling and workflows around how developers use them. The post matters because it points to deeper product focus on making Codex more usable inside real software teams, where monitoring, feedback loops, and developer experience often determine adoption.
Notion introduced Number Charts, a new dashboard element for displaying a single metric with customizable threshold-based colors. In the company’s post, the feature is positioned as a fast way to let one number tell the story, with yellow, green, and red states for quick status scanning. The launch broadens Notion’s reporting and dashboard toolkit, giving teams a simpler way to surface KPIs inside shared workspaces.
Andreessen Horowitz said the latest edition of its Top 100 Gen AI Consumer Apps ranking shows how quickly the consumer AI market is evolving beyond a narrow set of chat products. The firm argued that the newest leaders are increasingly global, multimodal, and embedded in everyday workflows, while Erik Torenberg separately amplified the release as evidence that the category deserves a refreshed lens on usage. The post matters because a16z’s ranking is one of the most widely watched snapshots of consumer AI adoption, and this edition points to a market where sustained engagement and product diversity are starting to matter as much as raw novelty.
Vinod Khosla said the real bar for robotics is autonomous performance in production environments, not polished lab demos, while highlighting Rhoda AI as a startup that impressed him with strong results from remarkably little robot training data. He emphasized the company’s use of internet-scale video pretraining to build a physical prior before deployment, suggesting a path to more general robotic capability without relying on massive amounts of expensive robot-specific data. The post matters because it captures a shift in how leading investors are judging physical AI: not by whether a robot can complete a staged demo, but by whether it can generalize reliably in the messy settings where commercial value is actually created.
Noam Brown said the core recipe behind frontier reasoning models looks surprisingly similar to AlphaGo. In his framing, the pattern is: imitate large volumes of human data, scale inference-time reasoning, and then apply reinforcement learning to move beyond imitation. The post stands out because it offers a concise mental model for how modern reasoning systems are evolving, linking today’s chain-of-thought and test-time compute strategies back to a landmark earlier system.
Hume said it is open-sourcing TADA, a text-audio dual alignment model designed to generate text and speech in one synchronized stream. The company said the architecture is meant to reduce token-level hallucinations while improving response speed, two of the main constraints that have limited real-time voice agents. The post matters because open-source voice models have often lagged behind closed systems on reliability and interaction quality, and Hume is positioning TADA as a practical step toward production-grade spoken AI.
Posts amplified by Databricks accounts point to OfficeQA Pro as a benchmark designed to test grounded reasoning on realistic enterprise workflows, including finding documents, extracting values, and performing analyses. The key claim is that frontier agents still score under 50 percent end-to-end. If that result holds up, it suggests the gap between flashy reasoning benchmarks and dependable workplace automation is still much wider than the AI hype cycle implies.
Google DeepMind said that a decade after AlphaGo, the techniques pioneered in that system are still compounding across the company’s research stack. In a new retrospective, the lab said those methods have already been used to prove mathematical statements and to assist scientists in making new discoveries. The broader significance is that DeepMind is presenting AlphaGo not as a historical trophy but as an early foundation for agentic systems that can reason through hard scientific problems.
Dan Shipper shared a custom Codex skill that connects to PostHog and a production database, then scans product data to identify bottlenecks and actionable growth insights. He described it as a “growth investigator” that works surprisingly well, pointing to a broader pattern in this batch where agentic tools are creeping from code generation into product analysis and marketing operations. The idea matters because it hints at a next phase for coding agents: not just helping teams build software, but helping them diagnose why the software is or is not growing.
Stanford researcher Percy Liang argued that simulation is becoming the next frontier for AI because the field’s most impressive breakthroughs happen when models can take actions inside a clear environment and learn from well-defined consequences. He pointed to examples like AlphaGo, IMO-level problem solving, and systems that can write complete apps from scratch inside a docker container, where reinforcement learning can safely explore and improve. The post matters because it frames the next wave of progress less as a race for isolated reasoning benchmarks and more as a race to build realistic environments where models can act, be evaluated, and iterated end to end. In the same batch, Databricks promoted OfficeQA Pro as an enterprise benchmark for grounded reasoning, reinforcing the idea that AI evaluation is moving toward task environments rather than standalone tests.
Runway says users can now access Characters directly inside the web app, where they can try preset personalities or create their own real-time assistants. The announcement turns the earlier Characters launch into a more concrete product surface and hints at a broader strategy: AI media tools are evolving from generation interfaces into persistent, interactive agent environments. Examples shared by users already show the feature being adapted for gaming guides and niche knowledge assistants.
AIFrontliner highlighted the release of LTX-2.3, describing it as a major overhaul of the open-weights video model with public weights, training code, benchmarks, and LoRAs. The thread called out sharper output from a rebuilt VAE, better image-to-video motion, native portrait generation up to 1080p, cleaner audio, and direct API access for builders. The release matters because it strengthens the open-source side of the fast-moving video model race at a time when many of the best-known systems are still gated behind closed interfaces.
NVIDIA said it is partnering with Thinking Machines to deploy at least one gigawatt of Vera Rubin systems for frontier AI model training. The announcement matters because it pushes frontier infrastructure talk beyond chip counts and into utility-scale capacity, signaling that the next tier of model builders will be judged partly by how much power and compute they can stand up, not just by benchmark results. For the broader market, it is another sign that frontier AI is becoming an industrial systems race spanning hardware, power, and platform control.
Y Combinator CEO Garry Tan spotlighted Legora's announcement that it has raised $550 million in a Series D led by Accel at a $5.55 billion valuation. The post is a reminder that the AI funding market is still rewarding companies with a sharp vertical wedge and credible enterprise adoption. For legal AI specifically, it suggests the category is moving from experimentation into major-scale capital formation.
Google launch-week updates highlighted a broader Gemini push into productivity and retrieval. Logan Kilpatrick said the company is rolling out a new Gemini-powered Docs, Sheets, Slides, and Drive experience with AI Overviews, fully editable AI-generated slides, and new grounding sources that make document writing more context aware. Hours later, he also introduced Gemini Embedding 2 as a new multimodal embedding model spanning text, images, video, audio, and documents. Together the updates matter because they show Google tightening the loop between where users create work and the multimodal context systems that help AI understand it.
A small but clear cluster from automation platforms suggests the category is moving beyond basic app-to-app recipes and toward AI-native workflow infrastructure. Make announced new If-else and Merge modules for cleaner branching logic, n8n promoted builder sessions focused on webhooks, MCPs, subworkflows, and error handling, and Zapier framed itself as a hands-on partner for getting AI projects into production. Taken together, the posts matter because they show automation vendors converging on the same promise: helping teams operationalize agents and more complex AI workflows rather than just stitch together SaaS tools.
A post amplified by Paul Graham pointed to an Unsloth repository with more than 250 notebooks for LLM training and inference, including workflows for RL, vision, audio, embeddings, and TTS. The notable part is the accessibility claim: developers can follow the stack locally on roughly 3GB of VRAM or run it for free on Colab. That framing makes the release a useful signal that open-source training tooling is continuing to move downmarket toward solo builders and smaller teams.
In the race to adopt and show value from AI, enterprises are moving faster than ever to deploy agentic AI as copilots, assistants, and autonomous task-runners. In late 2025, nearly two-thirds of companies were experimenting with AI agents, while 88% were using AI in at least one business function, up from 78% in 2024, according…
Pokémon Go was the world’s first augmented-reality megahit. Released in 2016 by the Google spinout Niantic, the AR twist on the juggernaut Pokémon franchise fast became a global phenomenon. From Chicago to Oslo to Enoshima, players hit the streets in the urgent hope of catching a Jigglypuff or a Squirtle or (with a huge amount…
Simon Willison shared poll results from 539 recent software job interviewees showing that 43% said experience with AI programming tools was required, 25% said it was optional, and only 32% said it did not come up at all. The finding was quickly echoed by Hugging Face cofounder Thom Wolf, who joked that applying for developer jobs without AI tool experience now looks like applying to be a telephone operator in 2026. The discussion matters because it suggests coding agents are shifting from a personal productivity edge to a concrete expectation in software hiring and review workflows.
Mira Murati said Thinking Machines is working with Nvidia to deploy at least 1 gigawatt of Vera Rubin systems, describing the effort as part of a push to bring adaptable collaborative AI to everyone. The post adds a direct founder-level confirmation to the company’s earlier infrastructure narrative and underscores how aggressively new AI labs are now signaling compute scale as a strategic moat.
Mira Murati / Thinking MachinesMar 10via @MiraMurati
Today we announced new beta features for Gemini in Sheets to help you create, organize and edit entire sheets, from basic tasks to complex data analysis — just describe …
Akshay Pachaar shared a workflow for running Claude Code against local models by pointing the tool at a llama.cpp server with the ANTHROPIC_BASE_URL environment variable, which removes API costs and keeps data on the user’s own machine. The idea stands out because it treats Claude Code less like a closed product and more like a reusable interface that can be swapped onto different backends. In the same batch, Nicolas Camara pitched browser infrastructure for agents through CDP and sandbox access, while Simon Kirane released an open-source “Make it Heavy” framework that recreates Grok Heavy-style behavior in the terminal. Together, the posts show agent tooling becoming more modular, hackable, and self-hosted.
Sara Hooker said Adaption AI is launching a research grant program that gives academic researchers around the world access to the company’s platform. The move stands out because compute and model access remain bottlenecks for many researchers, especially outside major AI centers, so grant programs can meaningfully shape who gets to experiment and publish. For Adaption, the announcement is also a distribution play: broader academic usage can turn into both technical feedback and long-term ecosystem influence.
Lambda Labs said it will appear at Nvidia GTC 2026 with booth demos built on Nvidia Blackwell architecture and an expert session covering Vera Rubin NVL72 and Nvidia GB300 NVL72. The preview is notable because it ties Lambda’s positioning directly to the next wave of high-end AI infrastructure that enterprises are watching for training and inference deployments. In practice, the post works as an early signal that GTC will again be a venue where infrastructure providers compete on access to Nvidia’s newest systems.
Security researcher Lukasz Olejnik said Amazon is holding a mandatory meeting about AI breaking internal systems after a briefing note described a trend of incidents with “high blast radius” caused by “Gen-AI assisted changes,” alongside incomplete best practices and safeguards. Gary Marcus quickly amplified the warning as evidence that reliability concerns around AI-assisted engineering are no longer theoretical. The post matters because it points to a new phase in the AI tooling story: companies are no longer just measuring how much agentic coding can accelerate delivery, but how much operational risk it can introduce when used across large production environments.
Marktechpost reported that ByteDance has released DeerFlow 2.0, an open-source “SuperAgent” framework designed to orchestrate sub-agents, memory systems, and sandboxes for more complex workflows. The release is notable because it reflects a broader shift in open source AI tooling away from single-agent chat interfaces and toward execution stacks built for multi-step autonomous work. For builders, it points to a more modular way to compose research, coding, and task automation systems.
Aravind Srinivas showcased two community-built examples of Perplexity Computer being used for practical consumer automation: one moved a Spotify playlist to YouTube Music from a single pasted URL, and another created a peer-to-peer file transfer app with direct encrypted transfer and no account requirement. The significance is not just the individual demos but the pattern they suggest. Computer-use agents are starting to look like a new layer for coordinating work across existing consumer apps and services, turning awkward manual flows into one-step tasks. That gives Perplexity Computer a clearer product identity as an automation surface rather than just a flashy demo environment.