MyAI  ·  Run Open-Source LLMs Locally on Linux Lite

The curated catalogue of open-source large language models you can install with one click in Linux Lite's MyAI assistant. All models run locally via Ollama — no cloud, no telemetry, no account. Sizes assume Q4_K_M quantisation; minimum VRAM is for usable GPU-accelerated inference (lower VRAM falls back to CPU+RAM, much slower). Source of truth: /usr/share/myai/hardware-profiles.json. Updated 12/05/26.

Why run an LLM locally?

Running large language models locally on Linux means your prompts, files and conversations never leave the machine. There's no API key, no rate limit, no subscription, and no upload of sensitive data to a third-party server. MyAI is the simplest free ChatGPT alternative for users who want privacy, offline access, and full control over which open-source model they use — from small language models (SLMs) like Gemma 3 Mini (1B) that run on a basic laptop, all the way to flagship 70B+ models on a workstation GPU.

MyAI ships with a hardware-aware recommendation engine that detects your CPU, RAM, NVIDIA / AMD GPU and VRAM, then shows you only the models that will actually run well. Whether you're looking for the best AI for coding, writing, math, agentic workflows, or a fast everyday assistant, the curated table below has a hand-picked option for every tier.

Curated Open-Source LLM Catalogue
Model Size (Q4) Min VRAM Min RAM Notes
Gemma 3 Mini
gemma3:1b
815 MB 4 GB Google   Google's compact 1B open-source LLM. Fast everywhere, ideal for casual chat and on-device AI on older Linux laptops.
smallcpu-friendly
Llama 3.2
llama3.2:3b
2.0 GB 6 GB Meta   3B all-rounder. Best default open-source LLM for typical Linux desktops. Great balance of speed and quality.
balanced
Mistral 7B
mistral:7b
4.1 GB 6 GB 8 GB Mistral AI   Strong open-source reasoning and instruction-following. A favourite for general-purpose writing and Q&A workflows.
quality
GLM 4.7 Flash
glm-4.7-flash
5.5 GB 6 GB 10 GB Zhipu AI   Lightweight GLM 4.7 "flash" variant from Zhipu AI. Optimised for fast responses with balanced quality — great for snappy chat.
balanced
Llama 3.1 8B
llama3.1:8b
4.7 GB 6 GB 10 GB Meta   Meta's flagship small model — very capable for its size. Strong choice for coding, writing, and general-purpose AI on mid-range hardware.
quality
Qwen 2.5 14B
qwen2.5:14b
9.0 GB 12 GB 16 GB Alibaba   14B all-rounder. Excellent for long-context, multilingual prompts, and math. Needs 12 GB+ VRAM or plenty of RAM.
heavy
Devstral Small 2
devstral-small-2
14 GB 14 GB 28 GB Mistral AI   Code-focused 24B model — the best open-source AI for coding and agentic workflows. Strong at multi-file edits and refactoring.
quality
Gemma 2 27B
gemma2:27b
16 GB 18 GB 32 GB Google   Google's mid-large dense model. Excellent quality for writing, summarisation, and essay tasks at a workstation scale.
largegpu-recommended
Mixtral 8x7B
mixtral:8x7b
26 GB 28 GB 48 GB Mistral AI   Mixture-of-experts (47 B total, ~13 B active per token). Fast for its quality — a popular local ChatGPT alternative.
largemoegpu-recommended
Llama 3.3 70B
llama3.3:70b
43 GB 42 GB 64 GB Meta   Meta's late-2024 70B open-source LLM. Near-frontier quality on a single 48 GB GPU — the strongest local Llama you can run.
xlargegpu-only
Qwen 3.6
qwen3.6:latest
47 GB 44 GB 64 GB Alibaba   Latest Qwen 3.6 release. Strong general-purpose, long-context, multilingual — a top open-source competitor to GPT-class models.
xlargegpu-only
Mixtral 8x22B
mixtral:8x22b
80 GB 80 GB 128 GB Mistral AI   Large mixture-of-experts (141 B). Workstation / server class — best when you need maximum reasoning depth.
xlargemoegpu-only
Best Open-Source AI by Use Case

Best AI for coding

Devstral Small 2 (24B) leads for agentic coding and multi-file edits. Qwen 2.5 14B and Llama 3.1 8B are strong runner-ups when you have less VRAM. All free, all local, all open-source.

Best AI for writing

Gemma 2 27B and Llama 3.3 70B produce the most polished long-form prose. For everyday writing on modest hardware, Mistral 7B punches well above its weight.

Best AI for math & reasoning

Qwen 2.5 14B and Qwen 3.6 are stand-outs for math, structured reasoning and multilingual tasks. Mixtral 8x7B is the best mixture-of-experts option.

Best AI for low-end hardware

Gemma 3 Mini (1B, ~815 MB) runs on 4 GB of RAM with no GPU required — the lightest reliable small language model in the catalogue. Llama 3.2 (3B) is a step up if you have 6 GB.

Best AI for fast chat

GLM 4.7 Flash (Zhipu AI) is optimised for snappy responses. On the smaller end, Llama 3.2 (3B) is a great quick-reply default.

Best ChatGPT alternative

For a free, private, offline replacement for ChatGPT: Mixtral 8x7B on a 24–48 GB GPU, or Llama 3.3 70B if you have a workstation card. Zero account, zero data leaves your machine.

Hardware Tier Mapping
Tier Triggered by Recommended default Models offered
Light < 6 GB RAM, no GPU Gemma 3 Mini 1 model — Gemma 3 Mini
Standard 6–15 GB RAM, no GPU Llama 3.2 3 models — adds Llama 3.2
Roomy CPU 16–31 GB RAM, no GPU Llama 3.2 5 models — adds Mistral 7B, GLM 4.7 Flash
Workstation CPU 32 GB+ RAM, no GPU Llama 3.1 8B 7 models — adds Llama 3.1 8B, Qwen 2.5 14B
GPU (small) < 6 GB VRAM (single or summed) Llama 3.2 2 models — Gemma 3 Mini, Llama 3.2
GPU (mid) 6–11 GB VRAM Mistral 7B 6 models — adds Mistral 7B, GLM 4.7 Flash, Llama 3.1 8B
GPU (high) 12–23 GB VRAM Qwen 2.5 14B 8 models — adds Qwen 2.5 14B, Devstral Small 2
GPU (very high) 24–47 GB VRAM Mixtral 8x7B 10 models — adds Gemma 2 27B, Mixtral 8x7B
GPU (extreme) 48 GB+ VRAM Llama 3.3 70B All 13 models — adds Llama 3.3 70B, Qwen 3.6, Mixtral 8x22B
Frequently Asked Questions
What is the best AI for coding you can run locally?

Devstral Small 2 (24B) is Mistral's code-focused open-source model — the best local option for agentic coding, code review and multi-file edits. Llama 3.1 8B and Qwen 2.5 14B are strong general-purpose alternatives when VRAM is tight. All run on Linux Lite via MyAI with no cloud or account required.

Can I run an LLM locally on Linux?

Yes. MyAI on Linux Lite uses Ollama under the hood to run open-source LLMs entirely on your hardware. Smaller models like Gemma 3 Mini run on CPU with as little as 4 GB of RAM; larger flagship models like Llama 3.3 70B need a workstation GPU. MyAI's hardware-aware picker selects models that will actually run on your machine.

What are the best open-source LLMs in 2026?

The strongest open-source LLMs for local inference are Meta Llama 3.3 70B, Mistral Mixtral 8x7B / 8x22B, Google Gemma 2 27B, Alibaba Qwen 3.6 and Mistral Devstral Small 2 (best for coding). For fast chat, Zhipu GLM 4.7 Flash is excellent. The best small language model (SLM) for low-end hardware is Gemma 3 Mini.

How do I run DeepSeek or other LLMs locally on Linux Lite?

MyAI ships with a curated catalogue of vetted models, but any Ollama-compatible model — including DeepSeek variants — can be pulled directly with ollama pull <model> after installing MyAI. The recommendation engine matches available models to your CPU, RAM, GPU and VRAM automatically.

What hardware do I need to run a local LLM?

For small language models (1–3B): 2–6 GB RAM, any modern CPU. For mid-size models (7–14B): 10–16 GB RAM or a 6–12 GB GPU. For large models (27B–70B+): 32 GB+ RAM and 18–48 GB VRAM. MyAI sums VRAM across multi-GPU NVIDIA setups (always) and ROCm-eligible AMD cards.

What is MCP (Model Context Protocol)?

MCP is an open standard for connecting AI assistants to external tools, files and data sources, popularised in 2024. MyAI runs models locally via Ollama's HTTP API, which MCP-compatible clients and agentic workflows can talk to directly. This lets you build private AI agents that operate on your own data without a cloud round-trip.

Is MyAI a free ChatGPT alternative?

Yes. MyAI is free, open-source, and runs entirely on your own computer. No accounts. No subscriptions. No cloud. No telemetry. After the first model download it works fully offline — a true private alternative to ChatGPT, Claude or Gemini for users who care about data ownership.

Llama vs Mistral vs Qwen — which open-source LLM should I pick?

Llama 3.1 8B is the best general-purpose all-rounder for typical desktops. Mistral 7B excels at instruction-following and reasoning at smaller sizes. Qwen 2.5 14B / Qwen 3.6 are strongest for multilingual tasks, long context and math. For coding specifically, Devstral Small 2 (also from Mistral) outperforms all three.

What is RAG (Retrieval-Augmented Generation)?

RAG combines a local LLM with a search step over your own documents, so the model can answer questions grounded in your data without uploading it anywhere. MyAI exposes Ollama's HTTP API on 127.0.0.1:7070, which any RAG framework (LangChain, LlamaIndex, etc.) can target for fully local retrieval-augmented workflows.

Does MyAI work offline?

Yes. The only network activity is the initial model download (typically 0.6–80 GB depending on which model you pick). Once a model is on disk, MyAI runs entirely offline — no internet connection required for chat, coding help or any other AI task.