オープンソースラボ
v13Model Catalogue
A place
to test
local models
Curated local and open-source AI models. From community workhorses to experimental frontier architectures.
Model of the Month
Spotlight — June 2026
Alibaba's Mixture-of-Experts flagship activates only 22B of its 235B parameters per token. It offers frontier-class reasoning with remarkable efficiency. Hybrid thinking mode toggles chain-of-thought on or off at inference time. Outperforms GPT-4o on multiple benchmarks while remaining fully locally runnable on high-end consumer rigs.
Language & Reasoning Models
View allBuilt for thoughtful discussion and helpful, structured reasoning. Strong multilingual generation, precise instruction-following, and clean formatting across technical and analytical tasks.
The newest Hermes generation in an efficient 70B class. Balanced speed and reasoning depth with controlled verbosity, reliable structured output, and strong code generation and debugging.
Phi-Reasoning delivers long, thorough reasoning responses for deep research and complex problem solving. Optimal for in-depth analysis, academic assistance, and technical explanations.
A 27B dense hybrid model pairing Gated Delta Networks with attention for long-context efficiency. Strong at coding, reasoning, and agent tasks across 201 languages under Apache 2.0.
A 12B collaboration between Mistral and NVIDIA, designed to be a drop-in efficient workhorse. State-of-the-art multilingual coverage, large context, and strong instruction following on a single GPU.
A 671B MoE flagship activating 37B per token. Exceptional math and reasoning with MIT-licensed weights. A community benchmark king that rivals closed frontier labs at a fraction of cost.
Code & Engineering Models
View allPoolside's leading coding agent model for intricate software development and agentic engineering workflows. Excels at repository-scale reasoning, debugging, refactoring, and autonomous development.
An advanced model combining deep engineering productivity with speed and emotional intelligence. Excels at autonomously driving end-to-end project execution, debugging, and log analysis.
A fast, next-generation model combining engineering productivity with speed. Stands out at autonomously driving end-to-end project execution and handling complex log analysis and debugging.
A cheap MoE reasoning model designed for coding and agent performance. Dynamically routes tokens through specialized experts with a hybrid attention architecture for fast inference.
The open coding workhorse that matches GPT-4o on code benchmarks while running locally. Supports 92 programming languages, fill-in-the-middle completion, and long repository context.
A MoE code specialist activating 21B of 236B parameters. State-of-the-art on HumanEval and competitive programming, with strong repo-level understanding and an MIT-licensed Lite variant for local rigs.
Vision & Multimodal
View allA unified iteration consolidating several Mistral models into one system: Magistral's reasoning, Pixtral's multimodal understanding, and Devstral's agentic coding in one workflow.
A cost-effective multimodal foundation model that efficiently processes text, images, and video, prioritizing speed and affordability with a massive 1M-token context window.
A leading open vision-language model with precise document parsing, object grounding, and long-video understanding. Excels at OCR, chart analysis, and UI agent control.
A natively multimodal 12B model with a 400M vision encoder. Handles variable image sizes and interleaved text+image at high throughput, runnable on a single 24GB consumer GPU.
Meta's flagship open vision model for high-resolution image reasoning, document QA, and visual grounding. Built on the proven Llama 3 backbone with an adapter-based image encoder.
A top open-weight vision specialist closing the gap with GPT-4o on multimodal benchmarks. Outstanding at fine-grained perception, multi-image reasoning, and scientific diagram understanding.
Audio & Embedding Models
View allThe de-facto open speech-to-text standard. Robust multilingual transcription and translation across 99 languages with strong noise resilience. Runs in real time via whisper.cpp on consumer hardware.
A versatile audio-language model that understands speech, ambient sound, and music. Supports voice-chat and audio-analysis modes for transcription, captioning, and spoken Q&A.
A versatile embedding model supporting dense, sparse, and multi-vector retrieval in one. Handles 100+ languages and inputs up to 8192 tokens, the go-to backbone for local RAG pipelines.
A highly-tuned 137M parameter local text embedding model. Features a massive 8192 context window and support for variable output dimensions, perfect for local RAG pipelines.
A non-autoregressive speech synthesis model utilizing flow matching. Capable of high-fidelity, zero-shot voice cloning with highly natural prosody and voice inflections.
An experimental real-time spoken audio-to-audio foundation model. Processes speech and generates natural, low-latency voice responses natively.
Experimental & Research
View allA compact, efficiency-first assistant model tuned for fast everyday reasoning, summarization, and chat. Designed to deliver responsive performance on modest hardware while staying lightweight.
A radically efficient 1.58-bit ternary-weight architecture. Matches full-precision quality at a fraction of the memory and energy, hinting at CPU-only LLM inference. Early research weights.
A high-efficiency State Space Model (SSM) architecture. Boasts linear time complexity and massive context capabilities, matching Transformers with significantly less overhead.
An RNN architecture designed to deliver Transformer-level quality. Featuring linear attention and low-resource multilingual execution without traditional scaling penalties.
A hybrid architecture combining linear attention State Space Models with Mixture-of-Experts (MoE) to handle large context windows with exceptional generation speeds.
A native early-stage mixed-modal architecture engineered to ingest and output both images and text sequentially, rather than using separate external projection encoders.
Ready to Go Local?
Start running open models today. No API keys, no subscriptions, no data leaving your machine.
Account Access
Sign in with your username and password. No access key needed for login.
No account yet? Create one
Get Started
Join the Lab. Enter your email to receive a verification code, or use an access key to register directly.
Already have an account? Sign in
Email Verification
A 6-digit verification code has been sent to your email address. Enter it below to activate your account.
Signed In
Welcome to Open-Source-Lab. Your account is active and the model playground is unlocked.
Model Library
The full catalogue of open and open-weight models with their core specs. Search by name or filter by modality.
Documentation
A practical field guide to running open-weight models on your own hardware: engines, quantization, memory math and performance tips.
1. Pick an inference engine Start here
Your engine determines speed, hardware support and ease of use. The most popular options in 2026:
- Ollama — easiest entry point; one-line model pulls and a built-in API. Great for beginners and Mac users.
- llama.cpp — the C/C++ core powering most local tooling. Runs GGUF models on CPU, CUDA, Metal, Vulkan and ROCm.
- vLLM — high-throughput GPU serving with PagedAttention; ideal for multi-user or production workloads.
- LM Studio — polished desktop GUI for browsing, downloading and chatting with GGUF models, no terminal needed.
2. Understand quantization
Quantization shrinks model weights from 16-bit down to 8, 4 or fewer bits, trading a little quality for big memory savings. Rules of thumb:
- Q8_0 — near-lossless, ~1 byte/param. Use when you have the VRAM.
- Q4_K_M — the sweet spot. ~4.5 bits/param with minimal quality loss; the default most people should run.
- Q3 / Q2 — emergency tier for fitting big models on small cards; noticeable degradation.
- Prefer a larger model at Q4 over a smaller model at Q8 when quality matters.
3. Memory math (VRAM estimate)
Approximate VRAM to load a model: params × bytes-per-param + KV-cache. At Q4 you need roughly 0.55 GB per billion parameters, plus context overhead.
- 7B @ Q4 → ~5 GB · fits an 8 GB card.
- 14B @ Q4 → ~9 GB · fits 12 GB.
- 32B @ Q4 → ~20 GB · needs 24 GB (RTX 4090 / 3090).
- 70B @ Q4 → ~42 GB · dual-GPU or 48 GB+, or offload to RAM.
Long context inflates the KV cache; a 128K window can add several GB on top.
4. Hardware tips
- Apple Silicon shines for local LLMs thanks to unified memory — an M-series with 64GB+ runs large MoE models comfortably.
- NVIDIA remains fastest via CUDA; the 3090/4090's 24GB is the consumer sweet spot.
- CPU + RAM offload works for big models but is slow; expect single-digit tokens/sec.
- MoE models (e.g. Qwen3.5, DeepSeek-V3) activate only a fraction of params per token, so they run far faster than their total size suggests.
5. Performance & quality tips
- Set -ngl / GPU layers as high as your VRAM allows to offload the whole model to GPU.
- Lower temperature (0.1–0.3) for code and factual tasks; raise it (0.7–1.0) for creative writing.
- Enable flash attention and KV-cache quantization to fit longer contexts.
- For reasoning models, allow plenty of output tokens so chain-of-thought isn't cut off.
- Use a quality chat template matching the model family — a wrong template silently wrecks output.
Benchmarks
Reasoning, knowledge and coding scores for catalogue models alongside reference open-source baselines. Click any column header to sort.
| Model | Type | MMLU | GPQA | HumanEval | MATH | Avg |
|---|
MMLU = general knowledge · GPQA = graduate-level science (Diamond) · HumanEval = code generation · MATH = competition mathematics. Scores are representative composite figures gathered from public reports and provider cards for orientation only — verify against primary sources before citing.
Community Hub
Copy this link to join our AI community on Discord and chat about local models, hardware and benchmarks.
Open DiscordPoe Integration
Test high-performance open-source models instantly in your browser on Poe. These setups bypass local installation steps and run directly on hosted hardware.
Alibaba's robust dense hybrid reasoning and coding model, highly competent in multilingual generation.
Nous Research's instruction-following flagship built on Llama 3.1, optimized for structured agent tasks.
A 671B Mixture-of-Experts engine delivering rapid token-by-token generation with highly advanced mathematics reasoning.
Microsoft's small-footprint reasoning model, offering deep chain-of-thought analysis and factual verification.
An efficient 12B collaborative build between Mistral and NVIDIA, optimized for solid multilingual tasks.
The baseline workhorse for open coding applications, supporting multiple software engineering workflows.
Model Playground
Select a model and start chatting. Powered by Groq and Mistral APIs.
