Awq Int4 - Search Videos

How to Save 80% VRAM using INT4 and AWQ Quantization

How to Save 80% VRAM using INT4 and AWQ Quantization

58 views1 week ago

YouTubeBreaking Divide

How to Quantize an LLM with GGUF or AWQ

How to Quantize an LLM with GGUF or AWQ

13.9K viewsOct 3, 2023

YouTubeTrelis Research

What is AWQ-INT4? Understanding Quantization Levels

What is AWQ-INT4? Understanding Quantization Levels

YouTubeBreaking Divide

AWQ for LLM Quantization

AWQ for LLM Quantization

12.9K viewsOct 25, 2023

YouTubeMIT HAN Lab

SAW-INT4: 4-Bit KV-Cache Quantization for LLMs

SAW-INT4: 4-Bit KV-Cache Quantization for LLMs

24 views1 month ago

YouTubeAI Research Roundup

AI Model Quantization: The Complete Guide — FP32 to Q4_K_M

AI Model Quantization: The Complete Guide — FP32 to Q4_K_M

49 views3 months ago

YouTubeMichel Laclé

LLM Quantization Techniques Explained - GPTQ AWQ GGUF HQQ BitNet

LLM Quantization Techniques Explained - GPTQ AWQ GGUF HQQ BitNet

584 views10 months ago

YouTubeJoydeep Bhattacharjee

LLM Quantization Explained: GPTQ, AWQ, QLoRA, GGUF and More

1.2K views2 months ago

YouTubeTales Of Tensors

Quantize LLMs with AWQ: Faster and Smaller Llama 3

7.2K viewsApr 26, 2024

YouTubeAI Anytime

Day 65/75 LLM Quantization Techniques [GPTQ - AWQ - BitsandBytes NF4] Python | Hugging Face GenAI

1.4K viewsApr 16, 2024

YouTubeFreeBirds Crew - Data Science and GenAI

Double Inference Speed with AWQ Quantization

3.4K viewsSep 26, 2023

YouTubeTrelis Research

AWQ: Activation-aware Weight Quantization for On-Device LLM Compression and Acceleration | GetMobile: Mobile Computing and Communications

TinyChat Computer running Llama2-7B Jetson Orin Nano. Key technique: AWQ 4bit quantization.

3.9K viewsApr 16, 2024

YouTubeMIT HAN Lab

LLM Fine-Tuning 12: LLM Quantization Explained( PART 1) | PTQ, QAT, GPTQ, AWQ, GGUF, GGML, llama.cpp

7.7K views9 months ago

YouTubeSunny Savita

What is LLM Quantization ?

3.2K viewsMar 19, 2025

YouTubeNew Machina

MLSys'24 Best Paper - AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration

4.5K viewsJun 6, 2024

YouTubeMIT HAN Lab

Quantization Explained Why INT4 Powers Edge LLMs — Gemma Series Part 5

YouTubeThe Stack Underflow

LLM Fine-Tuning 13: LLM Quantization Explained (PART 2) | PTQ, QAT, GPTQ, AWQ, GGUF, GGML, llama.cpp

5.1K views8 months ago

YouTubeSunny Savita

The Trauma Code: K-Drama Highlights and Insights

2.2K views3 months ago

TikTokkdram_awq

passando vergonha #fyp #foryou #rdr2 #superlinda #dance

12.4K views3 months ago

Optimize Your AI - Quantization Explained

465.1K viewsDec 28, 2024

YouTubeMatt Williams

[LLMs inference] quantization 量化整体介绍（bitsandbytes、GPTQ、GGUF、AWQ）

12.1K viewsJul 21, 2024

bilibili五道口纳什

LLM 양자화 & 경량화 완벽 가이드: 70B 모델을 4GB로 압축하는 마법! GPTQ, AWQ, GGUF 총정리

807 views5 months ago

YouTubeDoYouKnow

السامري السعودي: رايح بيشه وترندات جديدة

142.1K viewsDec 5, 2024

AWQ大模型量化INT4比FP16 推理快2倍，GPU内存1/3

4.3K viewsNov 4, 2023

bilibili小工蚁创始人

#طرب #لعب #رايح #رايح_بيشه #سامري #سامري_طرب #سامري_ثقيل #السعودية #الرياض #اكسبلور_تيك_توك #ترند #ترند_تيك_توك

310.3K viewsMay 22, 2024

Which Quantization Method is Right for You? (GPTQ vs. GGUF vs. AWQ)

39.4K viewsNov 23, 2023

YouTubeMaarten Grootendorst

#طرب #لعب #سامري #سامري_دواسر #سامري_ثقيل #السعودية #الرياض #رايح #رايح_بيشه #بيشه #الطائف #وادي_الدواسر #الافلاج #الخرج#اكسبلور_تيك_توك #ترند #ترند_تيك_توك

1.2M viewsMay 28, 2024

#طرب #لعب #رايح #رايح_بيشه #سامري #سامري_ثقيل #السعودية #بيشه #الرياض #الطائف #وادي_الدواسر #الافلاج #الخرج #اكسبلور_تيك_توك #ترند #ترند_تيك_توك

56.2K viewsJun 27, 2024

Reasoning LLMs generate very long chains-of-thought, so even small quantization errors add up.With AWQ, Qwen3-4B drops 71.0 → 68.2 on MMLU-Pro (~4% relative loss). 😬ParoQuant fixes this! It keeps only the critical rotation pairs and fuses everything into a single kernel.Recovers most of the lost reasoning accuracy with minimal overhead — so 4-bit models stay strong at reasoning. 💪💪

169.1K views2 months ago

x.comZhijian Liu

See more