Open Source
Ollama 2.0 Launches with Multi-Model Serving and Native GPU Partition Support
Ollama has released version 2.0 of its local model runner, introducing concurrent multi-model serving, native GPU partition support for simultaneous model loading on a single card, and a redesigned REST API with improved streaming responses. The update also adds a model library browser with metadata on license terms and benchmark scores. Ollama 2.0 is fully compatible with the GGUF 2.0 specification.
This summary is sourced from The Verge. For the full story with original reporting, analysis, and additional context, follow the source link below.
Tags
Ollamalocal AIGPUopen-sourceinference