Skip to main content
Open Source

Ollama 2.0 Launches with Multi-Model Serving and Native GPU Partition Support

·

Ollama has released version 2.0 of its local model runner, introducing concurrent multi-model serving, native GPU partition support for simultaneous model loading on a single card, and a redesigned REST API with improved streaming responses. The update also adds a model library browser with metadata on license terms and benchmark scores. Ollama 2.0 is fully compatible with the GGUF 2.0 specification.

This summary is sourced from The Verge. For the full story with original reporting, analysis, and additional context, follow the source link below.

Tags

Ollamalocal AIGPUopen-sourceinference
Read Full Story on The Verge