The curated catalogue of open-source large language models you can install with one click in Linux Lite's MyAI assistant. All models run locally via Ollama — no cloud, no telemetry, no account. Sizes assume Q4_K_M quantisation; minimum VRAM is for usable GPU-accelerated inference (lower VRAM falls back to CPU+RAM, much slower). Source of truth: /usr/share/myai/hardware-profiles.json. Updated 12/05/26.
Running large language models locally on Linux means your prompts, files and conversations never leave the machine. There's no API key, no rate limit, no subscription, and no upload of sensitive data to a third-party server. MyAI is the simplest free ChatGPT alternative for users who want privacy, offline access, and full control over which open-source model they use — from small language models (SLMs) like Gemma 3 Mini (1B) that run on a basic laptop, all the way to flagship 70B+ models on a workstation GPU.
MyAI ships with a hardware-aware recommendation engine that detects your CPU, RAM, NVIDIA / AMD GPU and VRAM, then shows you only the models that will actually run well. Whether you're looking for the best AI for coding, writing, math, agentic workflows, or a fast everyday assistant, the curated table below has a hand-picked option for every tier.
| Model | Size (Q4) | Min VRAM | Min RAM | Notes |
|---|---|---|---|---|
|
Gemma 3 Mini
gemma3:1b
|
815 MB | — | 4 GB |
Google
Google's compact 1B open-source LLM. Fast everywhere, ideal for casual chat and on-device AI on older Linux laptops.
smallcpu-friendly
|
|
Llama 3.2
llama3.2:3b
|
2.0 GB | — | 6 GB |
Meta
3B all-rounder. Best default open-source LLM for typical Linux desktops. Great balance of speed and quality.
balanced
|
|
Mistral 7B
mistral:7b
|
4.1 GB | 6 GB | 8 GB |
Mistral AI
Strong open-source reasoning and instruction-following. A favourite for general-purpose writing and Q&A workflows.
quality
|
|
GLM 4.7 Flash
glm-4.7-flash
|
5.5 GB | 6 GB | 10 GB |
Zhipu AI
Lightweight GLM 4.7 "flash" variant from Zhipu AI. Optimised for fast responses with balanced quality — great for snappy chat.
balanced
|
|
Llama 3.1 8B
llama3.1:8b
|
4.7 GB | 6 GB | 10 GB |
Meta
Meta's flagship small model — very capable for its size. Strong choice for coding, writing, and general-purpose AI on mid-range hardware.
quality
|
|
Qwen 2.5 14B
qwen2.5:14b
|
9.0 GB | 12 GB | 16 GB |
Alibaba
14B all-rounder. Excellent for long-context, multilingual prompts, and math. Needs 12 GB+ VRAM or plenty of RAM.
heavy
|
|
Devstral Small 2
devstral-small-2
|
14 GB | 14 GB | 28 GB |
Mistral AI
Code-focused 24B model — the best open-source AI for coding and agentic workflows. Strong at multi-file edits and refactoring.
quality
|
|
Gemma 2 27B
gemma2:27b
|
16 GB | 18 GB | 32 GB |
Google
Google's mid-large dense model. Excellent quality for writing, summarisation, and essay tasks at a workstation scale.
largegpu-recommended
|
|
Mixtral 8x7B
mixtral:8x7b
|
26 GB | 28 GB | 48 GB |
Mistral AI
Mixture-of-experts (47 B total, ~13 B active per token). Fast for its quality — a popular local ChatGPT alternative.
largemoegpu-recommended
|
|
Llama 3.3 70B
llama3.3:70b
|
43 GB | 42 GB | 64 GB |
Meta
Meta's late-2024 70B open-source LLM. Near-frontier quality on a single 48 GB GPU — the strongest local Llama you can run.
xlargegpu-only
|
|
Qwen 3.6
qwen3.6:latest
|
47 GB | 44 GB | 64 GB |
Alibaba
Latest Qwen 3.6 release. Strong general-purpose, long-context, multilingual — a top open-source competitor to GPT-class models.
xlargegpu-only
|
|
Mixtral 8x22B
mixtral:8x22b
|
80 GB | 80 GB | 128 GB |
Mistral AI
Large mixture-of-experts (141 B). Workstation / server class — best when you need maximum reasoning depth.
xlargemoegpu-only
|
Devstral Small 2 (24B) leads for agentic coding and multi-file edits. Qwen 2.5 14B and Llama 3.1 8B are strong runner-ups when you have less VRAM. All free, all local, all open-source.
Gemma 2 27B and Llama 3.3 70B produce the most polished long-form prose. For everyday writing on modest hardware, Mistral 7B punches well above its weight.
Qwen 2.5 14B and Qwen 3.6 are stand-outs for math, structured reasoning and multilingual tasks. Mixtral 8x7B is the best mixture-of-experts option.
Gemma 3 Mini (1B, ~815 MB) runs on 4 GB of RAM with no GPU required — the lightest reliable small language model in the catalogue. Llama 3.2 (3B) is a step up if you have 6 GB.
GLM 4.7 Flash (Zhipu AI) is optimised for snappy responses. On the smaller end, Llama 3.2 (3B) is a great quick-reply default.
For a free, private, offline replacement for ChatGPT: Mixtral 8x7B on a 24–48 GB GPU, or Llama 3.3 70B if you have a workstation card. Zero account, zero data leaves your machine.
| Tier | Triggered by | Recommended default | Models offered |
|---|---|---|---|
| Light | < 6 GB RAM, no GPU | Gemma 3 Mini | 1 model — Gemma 3 Mini |
| Standard | 6–15 GB RAM, no GPU | Llama 3.2 | 3 models — adds Llama 3.2 |
| Roomy CPU | 16–31 GB RAM, no GPU | Llama 3.2 | 5 models — adds Mistral 7B, GLM 4.7 Flash |
| Workstation CPU | 32 GB+ RAM, no GPU | Llama 3.1 8B | 7 models — adds Llama 3.1 8B, Qwen 2.5 14B |
| GPU (small) | < 6 GB VRAM (single or summed) | Llama 3.2 | 2 models — Gemma 3 Mini, Llama 3.2 |
| GPU (mid) | 6–11 GB VRAM | Mistral 7B | 6 models — adds Mistral 7B, GLM 4.7 Flash, Llama 3.1 8B |
| GPU (high) | 12–23 GB VRAM | Qwen 2.5 14B | 8 models — adds Qwen 2.5 14B, Devstral Small 2 |
| GPU (very high) | 24–47 GB VRAM | Mixtral 8x7B | 10 models — adds Gemma 2 27B, Mixtral 8x7B |
| GPU (extreme) | 48 GB+ VRAM | Llama 3.3 70B | All 13 models — adds Llama 3.3 70B, Qwen 3.6, Mixtral 8x22B |
Devstral Small 2 (24B) is Mistral's code-focused open-source model — the best local option for agentic coding, code review and multi-file edits. Llama 3.1 8B and Qwen 2.5 14B are strong general-purpose alternatives when VRAM is tight. All run on Linux Lite via MyAI with no cloud or account required.
Yes. MyAI on Linux Lite uses Ollama under the hood to run open-source LLMs entirely on your hardware. Smaller models like Gemma 3 Mini run on CPU with as little as 4 GB of RAM; larger flagship models like Llama 3.3 70B need a workstation GPU. MyAI's hardware-aware picker selects models that will actually run on your machine.
The strongest open-source LLMs for local inference are Meta Llama 3.3 70B, Mistral Mixtral 8x7B / 8x22B, Google Gemma 2 27B, Alibaba Qwen 3.6 and Mistral Devstral Small 2 (best for coding). For fast chat, Zhipu GLM 4.7 Flash is excellent. The best small language model (SLM) for low-end hardware is Gemma 3 Mini.
MyAI ships with a curated catalogue of vetted models, but any Ollama-compatible model — including DeepSeek variants — can be pulled directly with ollama pull <model> after installing MyAI. The recommendation engine matches available models to your CPU, RAM, GPU and VRAM automatically.
For small language models (1–3B): 2–6 GB RAM, any modern CPU. For mid-size models (7–14B): 10–16 GB RAM or a 6–12 GB GPU. For large models (27B–70B+): 32 GB+ RAM and 18–48 GB VRAM. MyAI sums VRAM across multi-GPU NVIDIA setups (always) and ROCm-eligible AMD cards.
MCP is an open standard for connecting AI assistants to external tools, files and data sources, popularised in 2024. MyAI runs models locally via Ollama's HTTP API, which MCP-compatible clients and agentic workflows can talk to directly. This lets you build private AI agents that operate on your own data without a cloud round-trip.
Yes. MyAI is free, open-source, and runs entirely on your own computer. No accounts. No subscriptions. No cloud. No telemetry. After the first model download it works fully offline — a true private alternative to ChatGPT, Claude or Gemini for users who care about data ownership.
Llama 3.1 8B is the best general-purpose all-rounder for typical desktops. Mistral 7B excels at instruction-following and reasoning at smaller sizes. Qwen 2.5 14B / Qwen 3.6 are strongest for multilingual tasks, long context and math. For coding specifically, Devstral Small 2 (also from Mistral) outperforms all three.
RAG combines a local LLM with a search step over your own documents, so the model can answer questions grounded in your data without uploading it anywhere. MyAI exposes Ollama's HTTP API on 127.0.0.1:7070, which any RAG framework (LangChain, LlamaIndex, etc.) can target for fully local retrieval-augmented workflows.
Yes. The only network activity is the initial model download (typically 0.6–80 GB depending on which model you pick). Once a model is on disk, MyAI runs entirely offline — no internet connection required for chat, coding help or any other AI task.