Top AI News Weekly

Week 10 · Mar 2 – Mar 8, 2026

GPT-5.4 launches with 1M-token context and native computer-use. Sarvam-105B open-sources a 106B MoE for 22 Indian languages. Immersive media surges with Helios real-time video gen, Kiwi-Edit, and LTX-2.3. CUDA Agent achieves 2.11x GPU kernel speedup via agentic RL. WorldMonitor and GitNexus lead developer tooling.

54 launches and research drops that matter for enterprise AI builders—curated, tagged, and ready for your next roadmap sync.

New drops

54

Unique sources

47

Key themes

Developer · Immersive · Agents

frontier

Frontier Models & Research

New reasoning systems, world models, and alignment papers that move the state of the art.

Frontier LLMOpenAI

GPT-5.4

OpenAI's most capable unified model with native computer-use, 1M-token context window, and 128K max output. Available in Standard, Thinking, and Pro variants. 33% fewer false claims per response vs GPT-5.2.

View release ↗
Frontier LLMAlibaba Qwen

Qwen3.5 Collection

Full Qwen3.5 model family with 24 variants from 0.8B to 397B-A17B. All are multimodal image-text-to-text models with hybrid Gated DeltaNet + sparse MoE architecture and 262K native context.

View model ↗
Frontier LLMSarvam AI

Sarvam-105B

Open-source MoE model with 106B total / 10.3B active params supporting 22 Indian languages. Top-8 routing over 128 experts with 128K context. Scores 90.6 MMLU, 98.6 Math500, and 96.7 on AIME 25 with tools.

View model ↗
Vision ModelMicrosoft

Phi-4-Reasoning-Vision-15B

15B multimodal reasoning model with SigLIP-2 vision encoder and mid-fusion architecture. Dynamic THINK/NOTHINK modes for selective reasoning. 88.2% on ScreenSpot-V2 GUI grounding, trained on 240 B200s.

View model ↗
Embedding ModelJina AI

Jina Embeddings V5 Text-Small

677M-param embedding model based on Qwen3-0.6B-Base with 1024-dim Matryoshka embeddings. Supports 119+ languages and 32K tokens. Highest performing multilingual embedding model under 1B parameters.

View model ↗
AI Safety ResearchTsinghua University

H-Neurons

Research identifying hallucination-associated neurons in LLMs. Less than 0.1% of total neurons reliably predict hallucination occurrences with strong cross-scenario generalization. Traces origins to pre-training phase.

Read paper ↗
Inference AccelerationResearch

Spectrum

Accelerates diffusion model inference up to 4.79x on FLUX.1 by forecasting features using Chebyshev polynomials in the spectral domain. Global long-range feature reuse with non-compounding error bounds.

View project ↗
AnalysisSebastian Raschka

Open-Weight LLM Architecture Review

Deep-dive analysis of 10 open-weight LLM architectures from Jan–Feb 2026 including Trinity 400B, Kimi K2.5 1T, GLM-5 744B, Qwen3.5 397B, and Sarvam 105B. Covers MoE design, attention mechanisms, and efficiency trade-offs.

View release ↗
immersive

Immersive Media & Simulation

Video, audio, and physics-native generation techniques shaping spatial computing.

Image EditingFireRedTeam

FireRed-Image-Edit-1.1

Open-source image editing foundation model with SOTA identity consistency, multi-element fusion (10+ elements), portrait makeup, and photo restoration. 4.5s end-to-end generation with 30GB VRAM. Apache 2.0 license.

View model ↗
Video GenerationLTX

LTX-2.3 Video Engine

DiT-based video generation model with synchronized audio from text, image, or audio inputs. Up to 1080p native portrait video, 20-second generation, and redesigned VAE for sharper detail. Open-source on HuggingFace.

View release ↗
Video EditingShowLab / NUS

Kiwi-Edit

Video editing framework using natural language instructions and reference images at 720p. Combines Multimodal LLM with video diffusion transformer. 3-stage training curriculum on 477K reference-guided quadruplets.

View project ↗
Image EditingTencent Hunyuan

HY-WU (Hunyuan Functional Memory)

Generates instance-conditioned LoRA adapters on-the-fly without fine-tuning for personalized image editing. Handles clothing fusion, character outfit migration, face identity transfer, and virtual try-on at inference time.

View project ↗
Video GenerationPKU Yuan Group

Helios

14B autoregressive diffusion model generating minute-scale videos at 19.5 FPS on a single H100. Addresses video drift without self-forcing or keyframe sampling through drift-simulating training strategies.

View project ↗
Image GenerationPKU DA Group

SpatialT2I

Enhances spatial understanding in text-to-image models using SpatialScore reward model trained on 80K+ preference pairs. GRPO online RL pipeline improves spatial accuracy over Flux.1-dev baselines.

View project ↗
Image InpaintingResearch

HiFi-Inpaint

DiT-based framework for seamlessly compositing product images into human photos for advertising and e-commerce. Shared Enhancement Attention (SEA) and Detail-Aware Loss preserve fine textures and text. Includes HP-Image-40K dataset.

View project ↗
Video EditingResearch

FREE-Edit

Zero-shot video editing that propagates first-frame edits throughout a video without fine-tuning. Editing-awaRE (REE) injection adaptively modulates token intensity using optical flow to track edited regions.

View project ↗
Video GenerationResearch

RealWonder

Real-time physics-conditioned video generation at 13.2 FPS from a single image. Uses physics simulation as intermediary to translate forces and robotic manipulations into visual outputs across rigid, deformable, fluid, and granular materials.

View project ↗
3D RenderingNVIDIA

DiffusionHarmonizer

Single-step temporally-conditioned enhancer that converts NeRF/3DGS renderings into temporally consistent, realistic outputs for robot simulation and autonomous vehicle testing. Runs on a single GPU during online simulation.

View release ↗
3D GenerationResearch

ArtiFixer

Enhances and extends 3D scene reconstruction by generating plausible novel views in unobserved areas. Combines autoregressive diffusion with 3D representations for scene generation from sparse reconstructions or text prompts.

View project ↗
3D TrackingResearch

Track4World

Feedforward world-centric dense 3D tracking of all pixels from monocular video. Novel 3D correlation scheme simultaneously estimates pixel-wise 2D and 3D dense flow between arbitrary frame pairs using a VGGT-style ViT.

View project ↗
3D UnderstandingHKU / Xiaomi

Utonia

Unified self-supervised encoder for all point cloud domains (remote sensing, LiDAR, indoor, CAD, video). Introduces Causal Modality Blinding, Perceptual Granularity Rescale, and RoPE for cross-domain spatial encoding.

View project ↗
360° Video GenerationResearch

CubeComposer

Spatio-temporal autoregressive diffusion model that natively generates 4K 360-degree videos. Decomposes video into cubemap faces with sparse context attention, cube-aware positional encoding, and continuity blending to eliminate boundary seams.

View project ↗
agents

Agents & Embodied Intelligence

Embodied agents learning to act in complex virtual and hybrid worlds.

GUI AgentInclusionAI

UI-Venus-1.5-2B

Unified 2B GUI agent for web, mobile, and desktop built on Qwen3-VL. 4-stage training pipeline (Mid-Training, Offline-RL, Online-RL, Model Merging) across 10B tokens and 30+ GUI datasets. 57.7% on ScreenSpot-Pro.

View model ↗
RoboticsBIGAI / Unitree

OmniXtreme

Unified humanoid robot policy for extreme acrobatic movements including backflips, handstands, and combat sequences. DAgger-based Flow Matching with Power-Safety Regularization prevents unsafe energy absorption.

View project ↗
GPU OptimizationByteDance Seed / Tsinghua

CUDA Agent

Large-scale agentic RL system that optimizes GPU kernels, achieving 2.11x speedup over torch.compile on KernelBench. Staged RL training (warmup, rejection fine-tuning, multi-turn optimization) up to 128K context.

View project ↗
Agent FrameworkGitAgent

GitAgent

Open standard for defining and managing AI agents as version-controlled files in git repos. Framework-agnostic (Claude, OpenAI, CrewAI), MIT-licensed CLI for scaffolding, validation, and export to multiple runtimes.

View release ↗
Voice AI FrameworkTEN Framework

TEN Framework

Open-source framework for building conversational voice AI agents with real-time multi-modal capabilities. Supports AI-powered voice, video, and interactive applications. 10K+ GitHub stars.

View repo ↗
Agent MemoryNevaMind AI

memU

Memory system for 24/7 proactive agents like OpenClaw with MCP integration, sandbox capabilities, and Claude Skills support. Designed for persistent, context-aware agent workflows. 12K+ GitHub stars.

View repo ↗
Agent TrainingAgent-on-the-Fly

Memento

Fine-tunes LLM agents without fine-tuning the underlying LLMs. Enables efficient agent behavior specialization while preserving the base model's general capabilities. 2.3K+ GitHub stars.

View repo ↗
Agent Skillscoreyhaines31

Marketing Skills for Claude Code

Marketing skills package for Claude Code and AI agents covering CRO, copywriting, SEO, analytics, and growth engineering. 11K+ GitHub stars.

View repo ↗
Agent Simulationpablodelucca

Pixel Agents

Pixel-art office simulation featuring autonomous AI agents in an interactive pixel environment. Visual sandbox for agent behavior experimentation. 3.6K+ GitHub stars.

View repo ↗
Voice AISarvam AI

Sarvam Voice Agent with Pipecat

Integration guide for building voice AI agents using Sarvam's TTS/STT APIs with the Pipecat framework. Supports 22 Indian languages with real-time voice interaction capabilities.

View release ↗
Agent Skills MarketplaceSkillsMP

SkillsMP

Agent Skills marketplace with 66K+ skills for Claude Code, Codex CLI, and ChatGPT. Community-curated SKILL.md files sourced from public GitHub repos with quality filtering, smart search, and category browsing.

View release ↗
Agent Memoryseigneurcui

memubot

Memory system for 24/7 proactive agents like OpenClaw (moltbot, clawdbot). Provides persistent conversation memory and context management for always-on AI agent deployments.

View repo ↗
Survey PaperUIUC / Multiple Institutions

Agentic Reasoning Survey

Comprehensive survey on LLMs as autonomous agents capable of planning, acting, and learning. Organizes agentic reasoning into three layers: foundational capabilities, self-improvement via feedback, and multi-agent collaboration.

Read paper ↗
tooling

Developer Tooling & Infra

Frameworks, playbooks, and OSS repos that level-up AI engineering velocity.

Vector DatabaseTurbopuffer

Turbopuffer

Serverless vector and full-text search database on object storage with sub-10ms p50 latency. Handles 2.5T+ documents, 10M+ writes/sec, 10K+ queries/sec in production. Up to 94% cheaper than alternatives.

View release ↗
AI CodingAugment

Augment Code

AI coding platform with a Context Engine that maintains live understanding of entire codebases including dependencies, architecture, and history. IDE agents (VS Code, JetBrains), CLI tool, and automated code review.

View release ↗
AI SearchExa Labs

Exa AI Search

Modern AI search engine with SERP API, website crawler, deep research API, and Websets for complex queries. Real-time web index updated every minute with SOC 2 Type II compliance.

View release ↗
AI InfrastructureAlibaba

OpenSandbox

General-purpose sandbox platform for AI applications with multi-language SDKs, unified APIs, and Docker/Kubernetes runtimes. Built for coding agents, GUI agents, agent evaluation, and RL training.

View repo ↗
Intelligence Dashboardkoala73

WorldMonitor

Real-time global intelligence dashboard with AI-powered news aggregation, geopolitical monitoring, and infrastructure tracking in a unified situational awareness interface. 33K+ GitHub stars.

View repo ↗
Developer Guideshanraisshan

Claude Code Best Practice

Comprehensive guide to Claude Code agentic engineering best practices, covering prompting patterns, workflows, and optimization techniques. 12K+ GitHub stars.

View repo ↗
Developer ToolAlexsJones

LLMfit

CLI tool to find which LLMs run on your hardware across hundreds of models and providers. One command to match model requirements to local GPU/CPU capabilities. 12K+ GitHub stars.

View repo ↗
Code Intelligenceabhigyanpatwari

GitNexus

Client-side knowledge graph engine that runs entirely in-browser. Drop in a GitHub repo or ZIP file to get an interactive knowledge graph with built-in Graph RAG agent for code exploration. 10K+ GitHub stars.

View repo ↗
Browser Automationpinchtab

PinchTab

High-performance browser automation bridge and multi-instance orchestrator with advanced stealth injection and real-time dashboard. Built in Go with CDP support. 5.5K+ GitHub stars.

View repo ↗
Model QuantizationIntel

Auto-Round

Accuracy-first quantization toolkit for LLMs supporting Weight-Only, MXFP4, NVFP4, GGUF, and adaptive schemes. Compatible with vLLM, SGLang, and Transformers for minimal quality degradation.

View repo ↗
Resource Listfelladrin

Awesome AI Web Search

Curated list of AI-powered web search tools covering search engines, metasearch, RAG implementations, and retrieval-augmented generation projects. 1.2K+ GitHub stars.

View repo ↗
Developer Toolsrizzon

Git City

Visualize GitHub profiles as 3D pixel art buildings in an interactive city. Built with Next.js, React Three Fiber, Three.js, and Supabase. 3K+ GitHub stars.

View repo ↗
Research ToolPaper2Notebook

Paper to Notebook

Converts research paper PDFs into executable PyTorch Jupyter notebooks using Gemini 2.0 Flash. Supports arXiv links and PDF uploads with trending paper discovery.

View release ↗
AI Coding APIAlibaba Cloud

Alibaba Cloud Coding Plan

Subscription API service for AI coding tools with flat-rate pricing. Compatible with Claude Code, OpenClaw, and Qwen Code. No surprise bills with predictable quota management.

View release ↗
Developer TrainingVibe2Real

Vibe2Real

Debugging simulator with 15 real production failure scenarios training developers in incident response without AI assistance. 87% failure rate enforces genuine debugging competence.

View release ↗
Video SDKVargHQ

Varg SDK

AI-native SDK for video tooling enabling programmatic video processing, editing, and generation workflows through a developer-friendly API.

View repo ↗
Cloud Toolkeidarcy

e1s

Terminal UI for managing AWS ECS resources — k9s-style interface for ECS services, tasks, and containers. Built in Go with ECS Exec support. 847 GitHub stars.

View repo ↗
Local AI StackLight Heart Labs

DreamServer

One-command local AI stack: LLM inference, chat UI, voice, agents, workflows, RAG, and image generation. Supports NVIDIA and AMD GPUs with Docker-based deployment. No cloud required.

View repo ↗
Finance ToolDaloopa

Daloopa Investing

Open-source investment research toolkit providing financial data extraction, analysis workflows, and AI-assisted due diligence for investors and analysts. 289 GitHub stars.

View repo ↗