2025 Reflection of AI development

At the end of the year, various AI leaders are publishing their annual reviews and technical summaries. Combining this with my 58 paper notes in Obsidian, I’d like to share my personal insights.

1. This year remains a period of technological buildup, despite numerous impressive models and applications emerging. However, truly transformative AI that significantly enhances productivity or economic efficiency has yet to appear—improvements remain largely concentrated in programming, closely tied to reinforcement learning with verification (RLVR). If continual learning or online learning achieves breakthroughs, we may witness broader and deeper impacts across certain industries and professions.

2. Continual learning, online learning, self-play, and self-improvement will become primary directions for AI research and exploration. These approaches represent the main pathways toward transitioning from human-curated data and synthetic data to experience-driven online learning.

One major bottleneck is model memory. Future AI will no longer rely on human-labeled datasets, pre-defined tasks, or static training sets, but will instead evolve through real-time interaction with the world, self-generated tasks, and continuous self-modification. I increasingly believe Ilya’s perspective: the future AGI may initially emerge as a general-purpose AI with intelligence comparable to a 15-year-old human, which then becomes an expert in various domains through sustained learning across different fields.

3. Model memory exists in short-term, medium-term, and long-term forms. Context windows constitute short-term memory; external memory systems and prompt learning represent medium-term memory; while in-weights learning constitutes long-term memory, enabling online updates to model parameters.

Google’s Titan+MIRAS framework and Nested Learning, presented at NeurIPS 2025, appear promising for overcoming this bottleneck. Next year, memory capabilities will likely become standard for AI agents—resolving short- and medium-term memory challenges appears foreseeable, though reliable mechanisms for real-time parameter updates remain uncertain. If all these challenges are overcome, it would represent a significant leap forward in AGI evolution.

4. Progress in unified perception models is equally impressive—the most practical application domain for such models. Rather than requiring multiple independent models to separately process visual, audio, spatial, and temporal information, a single integrated model combines a Meta Perception Encoder, a DETR variant, a SAM-style mask decoder, and SAM2-Memory mechanisms. This integration eliminates conflicts between multi-task objectives while enabling shared feature learning. The Cambrian-S series introduces “spatial supersensing,” constructing internal world models rather than functioning as passive observers. In security applications, each camera becomes an intelligent agent with its own goals and comprehension capabilities—not merely a pixel recorder.

5. Claude Code-style agents are building their ecosystem, with protocols like MCP, A2A, and Skills gradually becoming industry standards. Large language models serve as processors, agents function as operating systems, and Skills operate as applications—a comprehensive architecture taking shape. Programming agents are expanding beyond coding tasks into other domains, particularly as Skills gain widespread adoption by platforms like Codex. Domain-specific Skills provide a fast track for agents entering specialized fields. The question of whether building vertical-domain LLMs remains necessary is increasingly relevant—many industry problems may be solved effectively by combining general-purpose LLMs with agent Skills. Moreover, the automatic creation and updating of Skills during inference—similar to continual system prompt learning and GEPA—opens vast possibilities for Skill applications.