Open-Source-Lab

v31

Open-Source-Lab

Model Catalogue

A place
to test
local models

Curated local and open-source AI models. From community workhorses to experimental frontier architectures.

Browse 50 Models Try Live Chat

Test Local ModelsTry models instantly in your browser without installing them.

Check the specsCompare parameters, context sizes, and find your niche.

Exchange knowledgeDiscuss hardware configurations with other open source enthusiasts.

Model of the Month

Spotlight — June 2026

Qwen3.5-27B

OpenNewPopular

Alibaba's Mixture-of-Experts flagship activates only 22B of its 235B parameters per token. It offers frontier-class reasoning with remarkable efficiency. Hybrid thinking mode toggles chain-of-thought on or off at inference time. Outperforms GPT-4o on multiple benchmarks while remaining fully locally runnable on high-end consumer rigs.

235B / 22B activeParameters

128KContext

Apache 2.0License

MoEArchitecture

100+ langsLanguage

Test Model Try Live Chat

Key:

OpenApache / MIT / similar

NewReleased last 90 days

ExperimentalEarly / research stage

PopularCommunity favourite

LocalConsumer hardware

Language & Reasoning Models

View all

OpenPopular

Hermes-3-405B

Nous Research

Built for thoughtful discussion and helpful, structured reasoning. Strong multilingual generation, precise instruction-following, and clean formatting across technical and analytical tasks.

405BParams

128KContext

Llama 3.1License

DenseArch

Explore Model

OpenNew

Hermes-4-70B

Nous Research

The newest Hermes generation in an efficient 70B class. Balanced speed and reasoning depth with controlled verbosity, reliable structured output, and strong code generation and debugging.

70BParams

128KContext

Llama 3.1License

DenseArch

Explore Model

Φ4

OpenNew

Phi-4

Microsoft Research

Phi-Reasoning delivers long, thorough reasoning responses for deep research and complex problem solving. Optimal for in-depth analysis, academic assistance, and technical explanations.

14BParams

CoTMode

MITLicense

PDF ✓Files

Explore Model

OpenNewLocal

Qwen3.5-27B

Alibaba / Qwen Team

A 27B dense hybrid model pairing Gated Delta Networks with attention for long-context efficiency. Strong at coding, reasoning, and agent tasks across 201 languages under Apache 2.0.

27BParams

262KContext

Apache 2.0License

201 langsLanguage

Explore Model

OpenLocal

Mistral-NeMo

Mistral AI / NVIDIA

A 12B collaboration between Mistral and NVIDIA, designed to be a drop-in efficient workhorse. State-of-the-art multilingual coverage, large context, and strong instruction following on a single GPU.

12BParams

128KContext

Apache 2.0License

Multiling.Language

Explore Model

OpenPopular

DeepSeek-V3.1

DeepSeek

A 671B MoE flagship activating 37B per token. Exceptional math and reasoning with MIT-licensed weights. A community benchmark king that rivals closed frontier labs at a fraction of cost.

671B / 37BParams

128KContext

MITLicense

MoEArch

Explore Model

Code & Engineering Models

View all

NewPopular

Laguna-M.1

Poolside

Poolside's leading coding agent model for intricate software development and agentic engineering workflows. Excels at repository-scale reasoning, debugging, refactoring, and autonomous development.

128KContext

8K outOutput

fp8Quant

AgentFocus

Explore Model

OpenNew

GLM-5.2

Zhipu AI

An advanced model combining deep engineering productivity with speed and emotional intelligence. Excels at autonomously driving end-to-end project execution, debugging, and log analysis.

754BParams

205KContext

MITLicense

MultifileFiles

Explore Model

OpenNew

Minimax-M2.7

MiniMax

A fast, next-generation model combining engineering productivity with speed. Stands out at autonomously driving end-to-end project execution and handling complex log analysis and debugging.

205KContext

MultimodalFiles

AgenticFocus

FastProfile

Explore Model

OpenNew

MiMo-V2-Flash

Xiaomi MiMo

A cheap MoE reasoning model designed for coding and agent performance. Dynamically routes tokens through specialized experts with a hybrid attention architecture for fast inference.

309B / 15BParams

256KContext

MoEArch

MultimodalFiles

Explore Model

OpenLocal

Qwen2.5-Coder-32B

Alibaba / Qwen Team

The open coding workhorse that matches GPT-4o on code benchmarks while running locally. Supports 92 programming languages, fill-in-the-middle completion, and long repository context.

32BParams

128KContext

Apache 2.0License

92 langsCode

Explore Model

OpenPopular

DeepSeek-Coder-V2

DeepSeek

A MoE code specialist activating 21B of 236B parameters. State-of-the-art on HumanEval and competitive programming, with strong repo-level understanding and an MIT-licensed Lite variant for local rigs.

236B / 21BParams

128KContext

MITLicense

MoEArch

Explore Model

Vision & Multimodal

View all

OpenNewLocal

Mistral-Small-4

Mistral AI

A unified iteration consolidating several Mistral models into one system: Magistral's reasoning, Pixtral's multimodal understanding, and Devstral's agentic coding in one workflow.

MultimodalModality

ReasoningMagistral

CodingDevstral

VisionPixtral

Explore Model

New

Nova-Lite-2

Amazon

A cost-effective multimodal foundation model that efficiently processes text, images, and video, prioritizing speed and affordability with a massive 1M-token context window.

1MContext

Txt+Img+VidModality

AmazonMaker

PDF/JPGFiles

Explore Model

OpenPopular

Qwen2.5-VL-72B

Alibaba / Qwen Team

A leading open vision-language model with precise document parsing, object grounding, and long-video understanding. Excels at OCR, chart analysis, and UI agent control.

72BParams

128KContext

Apache 2.0License

OCR+VideoVision

Explore Model

OpenLocal

Pixtral-12B

Mistral AI

A natively multimodal 12B model with a 400M vision encoder. Handles variable image sizes and interleaved text+image at high throughput, runnable on a single 24GB consumer GPU.

12BParams

128KContext

Apache 2.0License

Native VLMVision

Explore Model

Open

Llama-3.2-90B-Vision

Audio & Embedding Models

View all

OpenPopular

Whisper-Large-v3

OpenAI

The de-facto open speech-to-text standard. Robust multilingual transcription and translation across 99 languages with strong noise resilience. Runs in real time via whisper.cpp on consumer hardware.

1.55BParams

99 langsLanguage

MITLicense

ASRTask

Explore Model

OpenNew

Qwen2-Audio

Alibaba / Qwen Team

A versatile audio-language model that understands speech, ambient sound, and music. Supports voice-chat and audio-analysis modes for transcription, captioning, and spoken Q&A.

7BParams

Audio+TxtModality

Apache 2.0License

Voice Q&ATask

Explore Model

OpenLocal

BGE-M3

BAAI

A versatile embedding model supporting dense, sparse, and multi-vector retrieval in one. Handles 100+ languages and inputs up to 8192 tokens, the go-to backbone for local RAG pipelines.

560MParams

8192Tokens

MITLicense

HybridRetrieval

Explore Model

OpenLocal

Nomic-Embed-v1.5

Nomic AI

A highly-tuned 137M parameter local text embedding model. Features a massive 8192 context window and support for variable output dimensions, perfect for local RAG pipelines.

137MParams

8192Tokens

Apache 2.0License

EmbedTask

Explore Model

OpenNew

F5-TTS

F5-TTS Team

A non-autoregressive speech synthesis model utilizing flow matching. Capable of high-fidelity, zero-shot voice cloning with highly natural prosody and voice inflections.

335MParams

Zero-shotContext

MITLicense

Audio GenTask

Explore Model

OpenExperimental

Moshi-v1.0

Kyutai

An experimental real-time spoken audio-to-audio foundation model. Processes speech and generates natural, low-latency voice responses natively.

7BParams

DuplexTask

CC-BY 4.0License

Real-timeAudio

Explore Model

Experimental & Research

View all

NewExperimentalLocal

Trinity-Mini

Arcee AI

A compact, efficiency-first assistant model tuned for fast everyday reasoning, summarization, and chat. Designed to deliver responsive performance on modest hardware while staying lightweight.

MiniClass

128KContext

EfficientProfile

DenseArch

Explore Model

OpenExperimentalLocal

BitNet-b1.58-3B

Microsoft Research

A radically efficient 1.58-bit ternary-weight architecture. Matches full-precision quality at a fraction of the memory and energy, hinting at CPU-only LLM inference. Early research weights.

3BParams

1.58-bitWeights

MITLicense

TernaryArch

Explore Model

OpenExperimental

Mamba-2-2.7B

Cartesia / CMU

A high-efficiency State Space Model (SSM) architecture. Boasts linear time complexity and massive context capabilities, matching Transformers with significantly less overhead.

2.7BParams

1MContext

Apache 2.0License

SSMArch

Explore Model

OpenExperimental

RWKV-6-World-7B

RWKV Foundation

An RNN architecture designed to deliver Transformer-level quality. Featuring linear attention and low-resource multilingual execution without traditional scaling penalties.

7BParams

InfiniteContext

Apache 2.0License

RNN-TransArch

Explore Model

OpenExperimental

Samba-CoE-v0.2

SambaNova

A hybrid architecture combining linear attention State Space Models with Mixture-of-Experts (MoE) to handle large context windows with exceptional generation speeds.

40B/8B actParams

256KContext

Apache 2.0License

SSM-MoEArch

Explore Model

OpenExperimental

Chameleon-7B

Ready to Go Local?

Start running open models today. No API keys, no subscriptions, no data leaving your machine.

Try Live Chat View Benchmarks

Model Library

Open-Source Models

The full catalogue of open and open-weight models with their core specs. Search by name or filter by modality.

Documentation

Running Local Models

A practical field guide to running open-weight models on your own hardware: engines, quantization, memory math and performance tips.

1. Pick an inference engine Start here

Your engine determines speed, hardware support and ease of use. The most popular options in 2026:

Ollama — easiest entry point; one-line model pulls and a built-in API. Great for beginners and Mac users.
llama.cpp — the C/C++ core powering most local tooling. Runs GGUF models on CPU, CUDA, Metal, Vulkan and ROCm.
vLLM — high-throughput GPU serving with PagedAttention; ideal for multi-user or production workloads.
LM Studio — polished desktop GUI for browsing, downloading and chatting with GGUF models, no terminal needed.

$ ollama run qwen2.5-coder:32b $ ./llama-server -m model.gguf -ngl 99 -c 8192

2. Understand quantization

Quantization shrinks model weights from 16-bit down to 8, 4 or fewer bits, trading a little quality for big memory savings. Rules of thumb:

Q8_0 — near-lossless, ~1 byte/param. Use when you have the VRAM.
Q4_K_M — the sweet spot. ~4.5 bits/param with minimal quality loss; the default most people should run.
Q3 / Q2 — emergency tier for fitting big models on small cards; noticeable degradation.
Prefer a larger model at Q4 over a smaller model at Q8 when quality matters.

3. Memory math (VRAM estimate)

Approximate VRAM to load a model: params × bytes-per-param + KV-cache. At Q4 you need roughly 0.55 GB per billion parameters, plus context overhead.

7B @ Q4 → ~5 GB · fits an 8 GB card.
14B @ Q4 → ~9 GB · fits 12 GB.
32B @ Q4 → ~20 GB · needs 24 GB (RTX 4090 / 3090).
70B @ Q4 → ~42 GB · dual-GPU or 48 GB+, or offload to RAM.

Long context inflates the KV cache; a 128K window can add several GB on top.

4. Hardware tips

Apple Silicon shines for local LLMs thanks to unified memory — an M-series with 64GB+ runs large MoE models comfortably.
NVIDIA remains fastest via CUDA; the 3090/4090's 24GB is the consumer sweet spot.
CPU + RAM offload works for big models but is slow; expect single-digit tokens/sec.
MoE models (e.g. Qwen3.5, DeepSeek-V3) activate only a fraction of params per token, so they run far faster than their total size suggests.

5. Performance & quality tips

Set -ngl / GPU layers as high as your VRAM allows to offload the whole model to GPU.
Lower temperature (0.1–0.3) for code and factual tasks; raise it (0.7–1.0) for creative writing.
Enable flash attention and KV-cache quantization to fit longer contexts.
For reasoning models, allow plenty of output tokens so chain-of-thought isn't cut off.
Use a quality chat template matching the model family — a wrong template silently wrecks output.

Benchmarks

Model Benchmarks

Reasoning, knowledge and coding scores for catalogue models alongside reference open-source baselines. Click any column header to sort.

Model	Type	MMLU	GPQA	HumanEval	MATH	Avg

MMLU = general knowledge · GPQA = graduate-level science (Diamond) · HumanEval = code generation · MATH = competition mathematics. Scores are representative composite figures gathered from public reports and provider cards for orientation only — verify against primary sources before citing.

Community Hub

Join us on Discord

Copy this link to join our AI community on Discord and chat about local models, hardware and benchmarks.

Poe Integration

Open-Source Models on Poe.com

Test high-performance open-source models instantly in your browser on Poe. These setups bypass local installation steps and run directly on hosted hardware.

Qwen3.5-27B

Alibaba's robust dense hybrid reasoning and coding model, highly competent in multilingual generation.

Hermes-3-405B

Nous Research's instruction-following flagship built on Llama 3.1, optimized for structured agent tasks.

DeepSeek-V3

A 671B Mixture-of-Experts engine delivering rapid token-by-token generation with highly advanced mathematics reasoning.

Phi-4

Microsoft's small-footprint reasoning model, offering deep chain-of-thought analysis and factual verification.

Mistral-NeMo

An efficient 12B collaborative build between Mistral and NVIDIA, optimized for solid multilingual tasks.

Qwen2.5-Coder-32B

The baseline workhorse for open coding applications, supporting multiple software engineering workflows.

Try on Poe

Model

Test this model instantly in your browser on Poe.com.

Model Playground

Chat with Open Models

Select an open-source model and start chatting.

Model

Select a model and send a message to start.

A placeto testlocal models

Model of the Month

Language & Reasoning Models

Code & Engineering Models

Vision & Multimodal

Audio & Embedding Models

Experimental & Research

Ready to Go Local?

1. Pick an inference engine Start here

2. Understand quantization

3. Memory math (VRAM estimate)

4. Hardware tips

5. Performance & quality tips

A place
to test
local models