Question 1

What is an open-source AI model?

Accepted Answer

An open-source AI model is an AI whose weights or code are publicly released, allowing anyone to download, modify, and self-host it. Popular examples include Meta's Llama, Mistral AI's Mistral, Alibaba's Qwen, and Stability AI's Stable Diffusion.

Question 2

How do I self-host an open-source AI model?

Accepted Answer

Popular tools: Ollama (easiest, local deployment), LM Studio (GUI, beginner-friendly), vLLM (high-performance server), llama.cpp (lightweight CPU/GPU inference). Choose a quantized version matching your GPU VRAM.

Question 3

How much VRAM do I need to run an AI model?

Accepted Answer

A 7B model (Q4 quantization) needs approximately 4-6 GB VRAM; 13B needs 8-10 GB; 70B needs 40-48 GB. OSAI Centre provides detailed VRAM requirement tables for every model.

Question 4

What types of AI models does OSAI Centre cover?

Accepted Answer

OSAI Centre covers 160+ open-source AI models: LLMs (Llama, Qwen, Mistral), image generation (Stable Diffusion, FLUX), speech recognition and synthesis, code generation, multimodal models, video generation, and AI agent frameworks.

Question 5

What is quantization in AI models?

Accepted Answer

Quantization compresses model weights (e.g., FP16 to Q4) to reduce VRAM usage and speed up inference. Common formats: Q2, Q3, Q4, Q5, Q6, Q8 — higher numbers mean better quality but require more VRAM.

Question 6

How do I choose the right open-source AI model?

Accepted Answer

Consider: 1) Your GPU VRAM size; 2) Use case (chat, code generation, image generation); 3) Language support; 4) License terms. Use OSAI Centre's VRAM filter and category filter to find the right model quickly.

Question 7

什麼是開源 AI 模型？

Accepted Answer

開源 AI 模型是指原始碼、模型權重或訓練資料公開發布的人工智慧模型，任何人都可以免費下載、修改和自行部署。常見的開源 AI 模型包括 Meta 的 Llama、Mistral AI 的 Mistral、阿里巴巴的 Qwen 等。

Question 8

如何自行部署開源 AI 模型？

Accepted Answer

最常用的工具包括：Ollama（最簡單，適合本機部署）、LM Studio（圖形介面，適合初學者）、vLLM（高效能伺服器部署）、llama.cpp（輕量化 CPU/GPU 推理）。根據你的 GPU VRAM 選擇合適的量化版本即可。

Question 9

我需要多少 VRAM 才能運行 AI 模型？

Accepted Answer

7B 模型（Q4 量化）約需 4-6 GB VRAM；13B 約需 8-10 GB；70B 約需 40-48 GB。OSAI Centre 為每個模型提供詳細的 VRAM 需求表，幫助你選擇合適的模型。

Question 10

什麼是量化（Quantization）？

Accepted Answer

量化通過降低模型權重精度（如 FP16 → Q4）來減少 VRAM 佔用和加快推理速度。常見格式：Q2、Q3、Q4、Q5、Q6、Q8，數字越大精度越高但 VRAM 需求也越大。

Question 11

What is the difference between Llama and GPT-4?

Accepted Answer

Llama (Meta) is open-source and free to self-host — you download the weights and run it on your own hardware with full privacy. GPT-4 (OpenAI) is a closed proprietary model accessible only via paid API. Llama 3.3 70B rivals GPT-4 on many benchmarks while being completely free to run locally.

Question 12

How do I install and use Ollama on Windows?

Accepted Answer

Download the Ollama installer from ollama.com, run it, then open PowerShell or CMD and type: ollama run llama3.2 (or any model name). Ollama automatically downloads the model and starts a local chat. For a GUI, pair it with Open WebUI or LM Studio.

Question 13

Can I run AI models on a Mac?

Accepted Answer

Yes! Macs with Apple Silicon (M1/M2/M3/M4) are excellent for local AI. Ollama natively supports macOS and uses the unified memory (RAM) as VRAM. An M2 Pro with 16 GB RAM can run 7B-13B models smoothly. Use: ollama run qwen2.5:7b in Terminal.

Question 14

Are open-source AI models free to use commercially?

Accepted Answer

It depends on the license. Llama 3.3 allows commercial use for most companies (under 700M monthly users). Qwen 2.5 uses Apache 2.0 (fully free). Mistral models are Apache 2.0. Always check the specific model's license on its Hugging Face page before commercial deployment.

Question 15

What is the best open-source LLM in 2025?

Accepted Answer

Top open-source LLMs in 2025: Llama 3.3 70B (best overall quality), Qwen 2.5 72B (strong multilingual and coding), Mistral Large (efficient), DeepSeek R1 (reasoning), Phi-4 (small but powerful). The best choice depends on your VRAM and use case.

Question 16

Can I run AI models without a GPU?

Accepted Answer

Yes, using CPU-only inference with llama.cpp or Ollama. A 7B Q4 model needs approximately 8 GB RAM and runs at 2-5 tokens per second on a modern CPU. Even a mid-range GPU (RTX 3060 12 GB) dramatically improves speed. Apple Silicon Macs use unified memory efficiently.

Question 17

Llama 和 GPT-4 有什麼分別？

Accepted Answer

Llama（Meta）是開源免費的，可下載權重在本地自行部署，資料完全私密。GPT-4（OpenAI）是閉源商業模型，只能透過付費 API 呼叫。Llama 3.3 70B 在多項基準測試上接近 GPT-4 水準，且完全免費本地運行。

Question 18

如何在 Windows 上安裝和使用 Ollama？

Accepted Answer

從 ollama.com 下載 Windows 安裝包，安裝後開啟 PowerShell 或命令提示字元，輸入：ollama run llama3.2（或其他模型名稱）。Ollama 會自動下載模型並啟動本地對話。如需圖形介面，可搭配 Open WebUI 或 LM Studio 使用。

Question 19

2025 年最好的開源大語言模型是哪個？

Accepted Answer

2025 年頂級開源 LLM：Llama 3.3 70B（綜合品質最佳）、Qwen 2.5 72B（多語言和程式碼能力強）、Mistral Large（高效）、DeepSeek R1（推理能力強）、Phi-4（小模型中的佼佼者）。最佳選擇取決於你的顯存大小和使用場景。

Question 20

What is the difference between Ollama, LM Studio, and vLLM?

Accepted Answer

Ollama: best for quick local deployment via CLI, supports most GGUF models. LM Studio: GUI-based, great for beginners on Windows/Mac, no command line needed. vLLM: high-throughput server for production use, supports OpenAI-compatible API, requires Linux/GPU. Choose Ollama or LM Studio for personal use; vLLM for serving multiple users.

Question 21

Is running AI models locally private and secure?

Accepted Answer

Yes - local AI models run entirely on your own hardware with no data sent to external servers. Your conversations, documents, and prompts never leave your machine. This makes local AI ideal for sensitive business data, personal privacy, and offline use cases.

Question 22

What is the best open-source AI model for coding?

Accepted Answer

Top coding models: Qwen 2.5 Coder 32B (best overall), DeepSeek Coder V2 (strong reasoning), CodeLlama 70B (Meta), Phi-4 (efficient). For local use with limited VRAM, Qwen 2.5 Coder 7B or 14B are excellent choices.

Question 23

What is the best open-source AI model for Chinese language?

Accepted Answer

Best Chinese-capable open-source models: Qwen 2.5 (Alibaba, best Chinese and English), DeepSeek V3 (strong Chinese reasoning), Yi-1.5 (01.AI), Baichuan 2. Qwen 2.5 72B is widely regarded as the top choice for Chinese NLP tasks.

Question 24

How do I run Stable Diffusion locally?

Accepted Answer

Use ComfyUI (most flexible, node-based workflow) or AUTOMATIC1111 WebUI (beginner-friendly). Download a model checkpoint (SDXL, FLUX) from Hugging Face or CivitAI. Minimum: 6 GB VRAM for SDXL; 8 GB+ recommended for FLUX. Runs on Windows, Mac (MPS), or Linux.

Question 25

How do I run AI models on Linux?

Accepted Answer

Install Ollama: curl -fsSL https://ollama.com/install.sh | sh, then run: ollama run llama3.2. For GPU acceleration, ensure NVIDIA drivers and CUDA are installed. vLLM is excellent for Linux server deployment: pip install vllm, then python -m vllm.entrypoints.openai.api_server --model Qwen/Qwen2.5-7B-Instruct.

Question 26

What is RAG (Retrieval-Augmented Generation)?

Accepted Answer

RAG combines a local AI model with a document retrieval system, allowing the model to answer questions based on your own files (PDFs, docs, notes). Popular tools: Ollama + Open WebUI (built-in RAG), LlamaIndex, LangChain, AnythingLLM. This lets you build a private ChatGPT over your own knowledge base.

Question 27

How do I compare two AI models side by side?

Accepted Answer

Use OSAI Centre model cards to compare VRAM requirements, benchmark scores, parameter counts, and deployment tools. For live comparison, try LM Studio (load two models and chat with both), or use the Open LLM Leaderboard on Hugging Face for standardized benchmark comparisons.

Question 28

Ollama、LM Studio 和 vLLM 有什麼區別？

Accepted Answer

Ollama：命令行快速本地部署，支援大多數 GGUF 模型。LM Studio：圖形介面，適合 Windows/Mac 初學者，無需命令行。vLLM：高吐吐量生產伺服器，支援 OpenAI 相容 API，需要 Linux/GPU。個人使用選 Ollama 或 LM Studio；服务多用戶選 vLLM。

Question 29

本地運行 AI 模型安全私密嗎？

Accepted Answer

本地 AI 模型完全在你自己的硬體上運行，沒有任何資料傳送到外部伺服器。你的對話、文件和提示詞永遠不會離開你的機器。這使得本地 AI 非常適合處理敗感業務資料、個人隱私和離線場景。

Question 30

最適合寫程式碼的開源 AI 模型是哪個？

Accepted Answer

頂級程式碼模型：Qwen 2.5 Coder 32B（綜合最佳）、DeepSeek Coder V2（推理能力強）、CodeLlama 70B（Meta）、Phi-4（高效）。VRAM 有限時，Qwen 2.5 Coder 7B 或 14B 是性價比最高的選擇。

Question 31

最適合中文的開源 AI 模型是哪個？

Accepted Answer

最佳中文開源模型：Qwen 2.5（阿里巴巴，中英文最強）、DeepSeek V3（中文推理能力強）、Yi-1.5（01.AI）、Baichuan 2。Qwen 2.5 72B 被廣泛認為是中文 NLP 任務的首選。

Question 32

如何在本地運行 Stable Diffusion？

Accepted Answer

使用 ComfyUI（最靈活，節點式工作流）或 AUTOMATIC1111 WebUI（初學者友好）。從 Hugging Face 或 CivitAI 下載模型檔案（SDXL、FLUX）。最低需求：SDXL 需 6 GB VRAM；FLUX 建議 8 GB+。支援 Windows、Mac（MPS）和 Linux。

Question 33

如何在 Linux 上運行 AI 模型？

Accepted Answer

安裝 Ollama：curl -fsSL https://ollama.com/install.sh | sh，然後執行：ollama run llama3.2。GPU 加速需確保安裝 NVIDIA 驅動和 CUDA。vLLM 適合 Linux 伺服器部署：pip install vllm，然後 python -m vllm.entrypoints.openai.api_server --model Qwen/Qwen2.5-7B-Instruct。

Question 34

什麼是 RAG（檢索增強生成）？

Accepted Answer

RAG 將本地 AI 模型與文件檢索系統結合，讓模型基於你自己的檔案（PDF、文件、筆記）回答問題。常用工具：Ollama + Open WebUI（內建 RAG）、LlamaIndex、LangChain、AnythingLLM。這讓你能基於自己的知識庫建立私有 ChatGPT。

量化	4K	8K	16K	32K	64K	128K
Q4_K_M★	7.0 GB	7.4 GB	8.2 GB	9.8 GB	13 GB	19 GB
Q8_0	13 GB	13 GB	14 GB	16 GB	19 GB	25 GB

Nemotron 3 Super (120B A12B)

VRAM 計算器

部署指南

方法二：llama.cpp

方法三：vLLM（高效能伺服器）

規格

模型強項

推薦用途

標籤

部署工具

關於 Nemotron 3 Super (120B A12B) — 開源 AI 模型

使用心得