Laboratório de Código Aberto
Loading...

Model Catalogue

A place
to test
local models

Curated local and open-source AI models. From community workhorses to experimental frontier architectures.

Test Local ModelsTry models instantly in your browser without installing them.
Check the specsCompare parameters, context sizes, and find your niche.
Exchange knowledgeDiscuss hardware configurations with other open source enthusiasts.

Model of the Month

Spotlight — June 2026

Qwen3.5-27B
OpenNewPopular

Alibaba's Mixture-of-Experts flagship activates only 22B of its 235B parameters per token. It offers frontier-class reasoning with remarkable efficiency. Hybrid thinking mode toggles chain-of-thought on or off at inference time. Outperforms GPT-4o on multiple benchmarks while remaining fully locally runnable on high-end consumer rigs.

235B / 22B activeParameters
128KContext
Apache 2.0License
MoEArchitecture
100+ langsLanguage
Key:
OpenApache / MIT / similar
NewReleased last 90 days
ExperimentalEarly / research stage
PopularCommunity favourite
LocalConsumer hardware

Language & Reasoning Models

View all
H3
OpenPopular
Hermes-3-405B
Nous Research

Built for thoughtful discussion and helpful, structured reasoning. Strong multilingual generation, precise instruction-following, and clean formatting across technical and analytical tasks.

405BParams
128KContext
Llama 3.1License
DenseArch
Explore Model
H4
OpenNew
Hermes-4-70B
Nous Research

The newest Hermes generation in an efficient 70B class. Balanced speed and reasoning depth with controlled verbosity, reliable structured output, and strong code generation and debugging.

70BParams
128KContext
Llama 3.1License
DenseArch
Explore Model
Φ4
OpenNew
Phi-4
Microsoft Research

Phi-Reasoning delivers long, thorough reasoning responses for deep research and complex problem solving. Optimal for in-depth analysis, academic assistance, and technical explanations.

14BParams
CoTMode
MITLicense
PDF ✓Files
Explore Model
Q3
OpenNewLocal
Qwen3.5-27B
Alibaba / Qwen Team

A 27B dense hybrid model pairing Gated Delta Networks with attention for long-context efficiency. Strong at coding, reasoning, and agent tasks across 201 languages under Apache 2.0.

27BParams
262KContext
Apache 2.0License
201 langsLanguage
Explore Model
NM
OpenLocal
Mistral-NeMo
Mistral AI / NVIDIA

A 12B collaboration between Mistral and NVIDIA, designed to be a drop-in efficient workhorse. State-of-the-art multilingual coverage, large context, and strong instruction following on a single GPU.

12BParams
128KContext
Apache 2.0License
Multiling.Language
Explore Model
DS
OpenPopular
DeepSeek-V3.1
DeepSeek

A 671B MoE flagship activating 37B per token. Exceptional math and reasoning with MIT-licensed weights. A community benchmark king that rivals closed frontier labs at a fraction of cost.

671B / 37BParams
128KContext
MITLicense
MoEArch
Explore Model

Code & Engineering Models

View all
LG
NewPopular
Laguna-M.1
Poolside

Poolside's leading coding agent model for intricate software development and agentic engineering workflows. Excels at repository-scale reasoning, debugging, refactoring, and autonomous development.

128KContext
8K outOutput
fp8Quant
AgentFocus
Explore Model
G5
OpenNew
GLM-5.1
Zhipu AI

An advanced model combining deep engineering productivity with speed and emotional intelligence. Excels at autonomously driving end-to-end project execution, debugging, and log analysis.

754BParams
205KContext
MITLicense
MultifileFiles
Explore Model
M2
OpenNew
Minimax-M2.7
MiniMax

A fast, next-generation model combining engineering productivity with speed. Stands out at autonomously driving end-to-end project execution and handling complex log analysis and debugging.

205KContext
MultimodalFiles
AgenticFocus
FastProfile
Explore Model
MV
OpenNew
MiMo-V2-Flash
Xiaomi MiMo

A cheap MoE reasoning model designed for coding and agent performance. Dynamically routes tokens through specialized experts with a hybrid attention architecture for fast inference.

309B / 15BParams
256KContext
MoEArch
MultimodalFiles
Explore Model
QC
OpenLocal
Qwen2.5-Coder-32B
Alibaba / Qwen Team

The open coding workhorse that matches GPT-4o on code benchmarks while running locally. Supports 92 programming languages, fill-in-the-middle completion, and long repository context.

32BParams
128KContext
Apache 2.0License
92 langsCode
Explore Model
DC
OpenPopular
DeepSeek-Coder-V2
DeepSeek

A MoE code specialist activating 21B of 236B parameters. State-of-the-art on HumanEval and competitive programming, with strong repo-level understanding and an MIT-licensed Lite variant for local rigs.

236B / 21BParams
128KContext
MITLicense
MoEArch
Explore Model

Vision & Multimodal

View all
S4
OpenNewLocal
Mistral-Small-4
Mistral AI

A unified iteration consolidating several Mistral models into one system: Magistral's reasoning, Pixtral's multimodal understanding, and Devstral's agentic coding in one workflow.

MultimodalModality
ReasoningMagistral
CodingDevstral
VisionPixtral
Explore Model
N2
New
Nova-Lite-2
Amazon

A cost-effective multimodal foundation model that efficiently processes text, images, and video, prioritizing speed and affordability with a massive 1M-token context window.

1MContext
Txt+Img+VidModality
AmazonMaker
PDF/JPGFiles
Explore Model
QV
OpenPopular
Qwen2.5-VL-72B
Alibaba / Qwen Team

A leading open vision-language model with precise document parsing, object grounding, and long-video understanding. Excels at OCR, chart analysis, and UI agent control.

72BParams
128KContext
Apache 2.0License
OCR+VideoVision
Explore Model
PX
OpenLocal
Pixtral-12B
Mistral AI

A natively multimodal 12B model with a 400M vision encoder. Handles variable image sizes and interleaved text+image at high throughput, runnable on a single 24GB consumer GPU.

12BParams
128KContext
Apache 2.0License
Native VLMVision
Explore Model
L3
Open
Llama-3.2-90B-Vision
Meta

Meta's flagship open vision model for high-resolution image reasoning, document QA, and visual grounding. Built on the proven Llama 3 backbone with an adapter-based image encoder.

90BParams
128KContext
Llama 3.2License
Image QAVision
Explore Model
IV
OpenNew
InternVL2.5-78B
Shanghai AI Lab / OpenGVLab

A top open-weight vision specialist closing the gap with GPT-4o on multimodal benchmarks. Outstanding at fine-grained perception, multi-image reasoning, and scientific diagram understanding.

78BParams
32KContext
MITLicense
Multi-imageVision
Explore Model

Audio & Embedding Models

View all
W3
OpenPopular
Whisper-Large-v3
OpenAI

The de-facto open speech-to-text standard. Robust multilingual transcription and translation across 99 languages with strong noise resilience. Runs in real time via whisper.cpp on consumer hardware.

1.55BParams
99 langsLanguage
MITLicense
ASRTask
Explore Model
QA
OpenNew
Qwen2-Audio
Alibaba / Qwen Team

A versatile audio-language model that understands speech, ambient sound, and music. Supports voice-chat and audio-analysis modes for transcription, captioning, and spoken Q&A.

7BParams
Audio+TxtModality
Apache 2.0License
Voice Q&ATask
Explore Model
BG
OpenLocal
BGE-M3
BAAI

A versatile embedding model supporting dense, sparse, and multi-vector retrieval in one. Handles 100+ languages and inputs up to 8192 tokens, the go-to backbone for local RAG pipelines.

560MParams
8192Tokens
MITLicense
HybridRetrieval
Explore Model
NE
OpenLocal
Nomic-Embed-v1.5
Nomic AI

A highly-tuned 137M parameter local text embedding model. Features a massive 8192 context window and support for variable output dimensions, perfect for local RAG pipelines.

137MParams
8192Tokens
Apache 2.0License
EmbedTask
Explore Model
F5
OpenNew
F5-TTS
F5-TTS Team

A non-autoregressive speech synthesis model utilizing flow matching. Capable of high-fidelity, zero-shot voice cloning with highly natural prosody and voice inflections.

335MParams
Zero-shotContext
MITLicense
Audio GenTask
Explore Model
MS
OpenExperimental
Moshi-v1.0
Kyutai

An experimental real-time spoken audio-to-audio foundation model. Processes speech and generates natural, low-latency voice responses natively.

7BParams
DuplexTask
CC-BY 4.0License
Real-timeAudio
Explore Model

Experimental & Research

View all
TM
NewExperimentalLocal
Trinity-Mini
Open-Source-Lab

A compact, efficiency-first assistant model tuned for fast everyday reasoning, summarization, and chat. Designed to deliver responsive performance on modest hardware while staying lightweight.

MiniClass
128KContext
EfficientProfile
DenseArch
Explore Model
BN
OpenExperimentalLocal
BitNet-b1.58-3B
Microsoft Research

A radically efficient 1.58-bit ternary-weight architecture. Matches full-precision quality at a fraction of the memory and energy, hinting at CPU-only LLM inference. Early research weights.

3BParams
1.58-bitWeights
MITLicense
TernaryArch
Explore Model
MB
OpenExperimental
Mamba-2-2.7B
Cartesia / CMU

A high-efficiency State Space Model (SSM) architecture. Boasts linear time complexity and massive context capabilities, matching Transformers with significantly less overhead.

2.7BParams
1MContext
Apache 2.0License
SSMArch
Explore Model
RW
OpenExperimental
RWKV-6-World-7B
RWKV Foundation

An RNN architecture designed to deliver Transformer-level quality. Featuring linear attention and low-resource multilingual execution without traditional scaling penalties.

7BParams
InfiniteContext
Apache 2.0License
RNN-TransArch
Explore Model
SB
OpenExperimental
Samba-CoE-v0.2
SambaNova

A hybrid architecture combining linear attention State Space Models with Mixture-of-Experts (MoE) to handle large context windows with exceptional generation speeds.

40B/8B actParams
256KContext
Apache 2.0License
SSM-MoEArch
Explore Model
CH
OpenExperimental
Chameleon-7B
Meta

A native early-stage mixed-modal architecture engineered to ingest and output both images and text sequentially, rather than using separate external projection encoders.

7BParams
8KContext
Llama 3.1License
Mixed-ModArch
Explore Model

Ready to Go Local?

Start running open models today. No API keys, no subscriptions, no data leaving your machine.

Account Access

Welcome back

Sign in with your username and password. No access key needed for login.

No account yet? Create one

Get Started

Create account

Join the Lab. Enter your email to receive a verification code, or use an access key to register directly.

Already have an account? Sign in

Email Verification

Verify your email

A 6-digit verification code has been sent to your email address. Enter it below to activate your account.

Simulated email
Your Open-Source-Lab verification code is:
------

Back to registration

Signed In

You're all set

Welcome to Open-Source-Lab. Your account is active and the model playground is unlocked.

Username
Status
Verified ✓
Change Password

Model Library

Open-Source Models

The full catalogue of open and open-weight models with their core specs. Search by name or filter by modality.

Documentation

Running Local Models

A practical field guide to running open-weight models on your own hardware: engines, quantization, memory math and performance tips.

1. Pick an inference engine Start here

Your engine determines speed, hardware support and ease of use. The most popular options in 2026:

  • Ollama — easiest entry point; one-line model pulls and a built-in API. Great for beginners and Mac users.
  • llama.cpp — the C/C++ core powering most local tooling. Runs GGUF models on CPU, CUDA, Metal, Vulkan and ROCm.
  • vLLM — high-throughput GPU serving with PagedAttention; ideal for multi-user or production workloads.
  • LM Studio — polished desktop GUI for browsing, downloading and chatting with GGUF models, no terminal needed.
$ ollama run qwen2.5-coder:32b $ ./llama-server -m model.gguf -ngl 99 -c 8192

2. Understand quantization

Quantization shrinks model weights from 16-bit down to 8, 4 or fewer bits, trading a little quality for big memory savings. Rules of thumb:

  • Q8_0 — near-lossless, ~1 byte/param. Use when you have the VRAM.
  • Q4_K_M — the sweet spot. ~4.5 bits/param with minimal quality loss; the default most people should run.
  • Q3 / Q2 — emergency tier for fitting big models on small cards; noticeable degradation.
  • Prefer a larger model at Q4 over a smaller model at Q8 when quality matters.

3. Memory math (VRAM estimate)

Approximate VRAM to load a model: params × bytes-per-param + KV-cache. At Q4 you need roughly 0.55 GB per billion parameters, plus context overhead.

  • 7B @ Q4 → ~5 GB · fits an 8 GB card.
  • 14B @ Q4 → ~9 GB · fits 12 GB.
  • 32B @ Q4 → ~20 GB · needs 24 GB (RTX 4090 / 3090).
  • 70B @ Q4 → ~42 GB · dual-GPU or 48 GB+, or offload to RAM.

Long context inflates the KV cache; a 128K window can add several GB on top.

4. Hardware tips

  • Apple Silicon shines for local LLMs thanks to unified memory — an M-series with 64GB+ runs large MoE models comfortably.
  • NVIDIA remains fastest via CUDA; the 3090/4090's 24GB is the consumer sweet spot.
  • CPU + RAM offload works for big models but is slow; expect single-digit tokens/sec.
  • MoE models (e.g. Qwen3.5, DeepSeek-V3) activate only a fraction of params per token, so they run far faster than their total size suggests.

5. Performance & quality tips

  • Set -ngl / GPU layers as high as your VRAM allows to offload the whole model to GPU.
  • Lower temperature (0.1–0.3) for code and factual tasks; raise it (0.7–1.0) for creative writing.
  • Enable flash attention and KV-cache quantization to fit longer contexts.
  • For reasoning models, allow plenty of output tokens so chain-of-thought isn't cut off.
  • Use a quality chat template matching the model family — a wrong template silently wrecks output.

Benchmarks

Model Benchmarks

Reasoning, knowledge and coding scores for catalogue models alongside reference open-source baselines. Click any column header to sort.

ModelTypeMMLUGPQAHumanEvalMATHAvg

MMLU = general knowledge · GPQA = graduate-level science (Diamond) · HumanEval = code generation · MATH = competition mathematics. Scores are representative composite figures gathered from public reports and provider cards for orientation only — verify against primary sources before citing.

Community Hub

Join us on Discord

Copy this link to join our AI community on Discord and chat about local models, hardware and benchmarks.

Open Discord

Poe Integration

Open-Source Models on Poe.com

Test high-performance open-source models instantly in your browser on Poe. These setups bypass local installation steps and run directly on hosted hardware.

Qwen3.5-27B

Alibaba's robust dense hybrid reasoning and coding model, highly competent in multilingual generation.

Hermes-3-405B

Nous Research's instruction-following flagship built on Llama 3.1, optimized for structured agent tasks.

DeepSeek-V3

A 671B Mixture-of-Experts engine delivering rapid token-by-token generation with highly advanced mathematics reasoning.

Phi-4

Microsoft's small-footprint reasoning model, offering deep chain-of-thought analysis and factual verification.

Mistral-NeMo

An efficient 12B collaborative build between Mistral and NVIDIA, optimized for solid multilingual tasks.

Qwen2.5-Coder-32B

The baseline workhorse for open coding applications, supporting multiple software engineering workflows.

Model Playground

Chat with Open Models

Select a model and start chatting. Powered by Groq and Mistral APIs.

Select a model and send a message to start.