All
Search
Images
Videos
Shorts
Maps
News
More
Shopping
Flights
Travel
Notebook
Report an inappropriate content
Please select one of the options below.
Not Relevant
Offensive
Adult
Child Sexual Abuse
Length
All
Short (less than 5 minutes)
Medium (5-20 minutes)
Long (more than 20 minutes)
Date
All
Past 24 hours
Past week
Past month
Past year
Resolution
All
Lower than 360p
360p or higher
480p or higher
720p or higher
1080p or higher
Source
All
Dailymotion
Vimeo
Metacafe
Hulu
VEVO
Myspace
MTV
CBS
Fox
CNN
MSN
Price
All
Free
Paid
Clear filters
SafeSearch:
Moderate
Strict
Moderate (default)
Off
Filter
1:04
How to Save 80% VRAM using INT4 and AWQ Quantization
58 views
1 week ago
YouTube
Breaking Divide
26:21
How to Quantize an LLM with GGUF or AWQ
13.9K views
Oct 3, 2023
YouTube
Trelis Research
1:00
What is AWQ-INT4? Understanding Quantization Levels
1 week ago
YouTube
Breaking Divide
20:40
AWQ for LLM Quantization
12.9K views
Oct 25, 2023
YouTube
MIT HAN Lab
5:05
SAW-INT4: 4-Bit KV-Cache Quantization for LLMs
24 views
1 month ago
YouTube
AI Research Roundup
4:47
AI Model Quantization: The Complete Guide — FP32 to Q4_K_M
49 views
3 months ago
YouTube
Michel Laclé
59:04
LLM Quantization Techniques Explained - GPTQ AWQ GGUF HQQ BitNet
584 views
10 months ago
YouTube
Joydeep Bhattacharjee
30:14
LLM Quantization Explained: GPTQ, AWQ, QLoRA, GGUF and More
1.2K views
2 months ago
YouTube
Tales Of Tensors
25:26
Quantize LLMs with AWQ: Faster and Smaller Llama 3
7.2K views
Apr 26, 2024
YouTube
AI Anytime
11:11
Day 65/75 LLM Quantization Techniques [GPTQ - AWQ - BitsandBytes NF4] Python | Hugging Face GenAI
1.4K views
Apr 16, 2024
YouTube
FreeBirds Crew - Data Science and GenAI
22:49
Double Inference Speed with AWQ Quantization
3.4K views
Sep 26, 2023
YouTube
Trelis Research
AWQ: Activation-aware Weight Quantization for On-Device LLM Compression and Acceleration | GetMobile: Mobile Computing and Communications
Jan 21, 2025
acm.org
0:51
TinyChat Computer running Llama2-7B Jetson Orin Nano. Key technique: AWQ 4bit quantization.
3.9K views
Apr 16, 2024
YouTube
MIT HAN Lab
2:12:21
LLM Fine-Tuning 12: LLM Quantization Explained( PART 1) | PTQ, QAT, GPTQ, AWQ, GGUF, GGML, llama.cpp
7.7K views
9 months ago
YouTube
Sunny Savita
9:57
What is LLM Quantization ?
3.2K views
Mar 19, 2025
YouTube
New Machina
18:57
MLSys'24 Best Paper - AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
4.5K views
Jun 6, 2024
YouTube
MIT HAN Lab
5:30
Quantization Explained Why INT4 Powers Edge LLMs — Gemma Series Part 5
6 days ago
YouTube
The Stack Underflow
3:21:13
LLM Fine-Tuning 13: LLM Quantization Explained (PART 2) | PTQ, QAT, GPTQ, AWQ, GGUF, GGML, llama.cpp
5.1K views
8 months ago
YouTube
Sunny Savita
0:24
The Trauma Code: K-Drama Highlights and Insights
2.2K views
3 months ago
TikTok
kdram_awq
0:25
passando vergonha #fyp #foryou #rdr2 #superlinda #dance
12.4K views
3 months ago
TikTok
awq.sim
12:10
Optimize Your AI - Quantization Explained
465.1K views
Dec 28, 2024
YouTube
Matt Williams
23:12
[LLMs inference] quantization 量化整体介绍(bitsandbytes、GPTQ、GGUF、AWQ)
12.1K views
Jul 21, 2024
bilibili
五道口纳什
8:14
LLM 양자화 & 경량화 완벽 가이드: 70B 모델을 4GB로 압축하는 마법! GPTQ, AWQ, GGUF 총정리
807 views
5 months ago
YouTube
DoYouKnow
0:48
السامري السعودي: رايح بيشه وترندات جديدة
142.1K views
Dec 5, 2024
TikTok
awq_88
10:51
AWQ大模型量化INT4比FP16 推理快2倍,GPU内存1/3
4.3K views
Nov 4, 2023
bilibili
小工蚁创始人
0:28
#طرب #لعب #رايح #رايح_بيشه #سامري #سامري_طرب #سامري_ثقيل #السعودية #الرياض #اكسبلور_تيك_توك #ترند #ترند_تيك_توك
310.3K views
May 22, 2024
TikTok
awq_88
15:51
Which Quantization Method is Right for You? (GPTQ vs. GGUF vs. AWQ)
39.4K views
Nov 23, 2023
YouTube
Maarten Grootendorst
2:08
#طرب #لعب #سامري #سامري_دواسر #سامري_ثقيل #السعودية #الرياض #رايح #رايح_بيشه #بيشه #الطائف #وادي_الدواسر #الافلاج #الخرج#اكسبلور_تيك_توك #ترند #ترند_تيك_توك
1.2M views
May 28, 2024
TikTok
awq_88
1:46
#طرب #لعب #رايح #رايح_بيشه #سامري #سامري_ثقيل #السعودية #بيشه #الرياض #الطائف #وادي_الدواسر #الافلاج #الخرج #اكسبلور_تيك_توك #ترند #ترند_تيك_توك
56.2K views
Jun 27, 2024
TikTok
awq_88
0:27
Reasoning LLMs generate very long chains-of-thought, so even small quantization errors add up.With AWQ, Qwen3-4B drops 71.0 → 68.2 on MMLU-Pro (~4% relative loss). 😬ParoQuant fixes this! It keeps only the critical rotation pairs and fuses everything into a single kernel.Recovers most of the lost reasoning accuracy with minimal overhead — so 4-bit models stay strong at reasoning. 💪💪
169.1K views
2 months ago
x.com
Zhijian Liu
See more
More like this
Feedback