Top AI News Weekly

Week 3 · Jan 12 – Jan 18, 2026

Speech AI dominates with Pocket-TTS, VibeVoice-7B, and PersonaPlex-7B. ShowUI-Aloha/π advance GUI agents, Google launches Universal Commerce Protocol, and GLM-Image brings hybrid image generation.

36 launches and research drops that matter for enterprise AI builders—curated, tagged, and ready for your next roadmap sync.

New drops

36

Unique sources

31

Key themes

Frontier models · Agents · Infra

frontier

Frontier Models & Research

New reasoning systems, world models, and alignment papers that move the state of the art.

ReasoningCerebras

GLM-4.7-REAP-218B-A32B

218B parameter MoE model with 32B active params, vLLM compatible, REAP methodology for enhanced reasoning.

Open source ↗
Image GenZ.ai

GLM-Image

Hybrid autoregressive + diffusion image model with text-to-image, editing, style transfer, and identity-preserving generation.

Open source ↗
TranslationGoogle

TranslateGemma

Open translation model supporting 55 languages, 12B outperforms Gemma 3 27B baseline with multimodal capabilities.

Open source ↗
LLM ResearchStanford

Verbalized Sampling

Training-free prompting strategy to mitigate mode collapse in LLMs, achieving 2-3x diversity improvement.

Open source ↗
Agent LLMNVIDIA

ToolOrchestra

Small orchestrators managing models and tools for complex reasoning on Humanity's Last Exam benchmark.

Open source ↗
Small TTSekwek

Soprano-1.1-80M

Ultra-compact 80M TTS model with <1GB memory, 32kHz crystal clear audio, WebUI and OpenAI-compatible endpoint.

Open source ↗
Small TTSSupertone

Supertonic-2

66M param multilingual TTS, 167× faster than real-time, optimized for on-device deployment.

Open source ↗
Speech AINVIDIA

PersonaPlex-7B

Full-duplex speech model with consistent persona, based on Moshi architecture for natural conversations.

Open source ↗
Speech AIMicrosoft

VibeVoice-7B

Frontier open-source TTS for podcasts and multi-speaker audio with 7.5Hz continuous speech tokenizers.

Open source ↗
Speech AIKyutai Labs

Pocket-TTS

Lightweight CPU-only TTS that fits in your pocket, pip install and go with 1.6B delayed streams model.

Open source ↗
Speech AISilero

Silero Models

Pre-trained enterprise-grade STT/TTS models with multi-language support via PyTorch hub.

Open source ↗
Audio SRysharma3501

NovaSR

Tiny 52KB audio upsampler, 16kHz→48kHz at 3500x realtime for TTS enhancement.

Open source ↗
Image GenUnsloth

Qwen-Image-2512 GGUF

Top open-source diffusion model with realistic people, rich textures, and accurate text rendering via ComfyUI.

Open source ↗
immersive

Immersive Media & Simulation

Video, audio, and physics-native generation techniques shaping spatial computing.

Video GenResearch

VerseCrafter

4D-aware video world model with unified control over camera and multi-object motion via GeoAdapter.

Open source ↗
Image EditAI Forever

VIBE

Visual instruction-based image editor, powerful open-source framework for text-guided editing.

Open source ↗
Depth EstAIGeeksGroup

AnyDepth

Simple and efficient zero-shot monocular depth estimation with reduced parameters and computational cost.

Open source ↗
3D GenMeta

ShapeR

Robust conditional 3D shape generation from casual captures using Aria glasses pipeline.

Open source ↗
AnimationResearch

RigMo

Unified rig and motion learning from mesh sequences with Gaussian bones and skinning weights.

Open source ↗
Medical AIResearch

HeartMula

Heart-related medical AI visualization and reconstruction research.

Open source ↗
3D ReconHKUST

UniSH

Unified scene and human reconstruction in a single feed-forward pass.

Open source ↗
Video GenPixVerse

PixVerse V5

AI video generation platform with latest V5 model for high-quality video creation.

Open source ↗
agents

Agents & Embodied Intelligence

Embodied agents learning to act in complex virtual and hybrid worlds.

GUI AgentShowLab

ShowUI-Aloha

Human-taught GUI agent that learns workflows from screen recordings with recorder, learner, planner, and actor.

Open source ↗
GUI AgentShowLab

ShowUI-π

450M flow-based VLA model for continuous GUI actions, generating smooth clicks and drags from screen observations.

Open source ↗
Vision AgentGetStream

Vision-Agents

Open framework for real-time video AI with Stream's edge network for ultra-low latency.

Open source ↗
Agent ResearchNVIDIA Research

SLM Agents Research

Research on small language models as the future of agentic AI for specialized applications.

Open source ↗
RL EnvMeta PyTorch

OpenEnv

Interface library for RL post-training with environments including Echo, Code Sandbox, and Oumi integration.

Open source ↗
EducationAdam Maj

Tiny-GPU

Minimal GPU design in Verilog to learn GPU architecture from ground up, 10k+ stars educational project.

Open source ↗
tooling

Developer Tooling & Infra

Frameworks, playbooks, and OSS repos that level-up AI engineering velocity.

MCPPhil Schmid

MCP-CLI

Lightweight CLI for MCP servers with JSON output, agent-optimized for Gemini CLI and Claude Code.

Open source ↗
Claude Toolthedotmack

Claude-Mem

Claude Code plugin for persistent memory across sessions, captures tool usage and injects context.

Open source ↗
ProtocolGoogle

Universal Commerce Protocol

Open protocol for agentic commerce, co-developed with Shopify, Stripe, Walmart, and 20+ partners.

Open source ↗
TutorialDailyDoseOfDS

MLOps Crash Course

Comprehensive MLOps + LLMOps tutorial series covering foundations to production deployment.

Open source ↗
Promptinglabeldekho

Perplexity Prompts Guide

Battle-tested prompting strategies for RAG-based search engines like Perplexity AI.

Open source ↗
PDF Toolpikepdf

pikepdf

Python library for reading and writing PDFs, powered by QPDF with Pythonic interface.

Open source ↗
CLI Toolsupreme-gg-gg

Instagram-CLI

Terminal UI for Instagram in TypeScript/Python, the ultimate weapon against brainrot.

Open source ↗
ComfyUIComfyUI Studio

ComfyUI Installers

1-click installers with UV for 100x faster installs, Torch 2.9, CUDA 13, FaceID, and IP-Adapter.

Open source ↗
Resource Listad-si

Awesome Video Production

Curated list of video production tools, AI generators, teleprompters, and editing software.

Open source ↗