Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
Spaces:
Alovestocode
/
ZeroGPU-LLM-Inference
like
0
Sleeping
App
Files
Files
Community
Fetching metadata from the HF Docker repository...
main
ZeroGPU-LLM-Inference
165 kB
1 contributor
History:
83 commits
Alikestocode
Add GPU estimator, DDG search, and cancel support
4ce42e8
about 1 month ago
.dockerignore
104 Bytes
Add Google Cloud Platform deployment configurations
about 1 month ago
.gitattributes
1.52 kB
Initial commit: ZeroGPU LLM Inference Space
about 1 month ago
.gitignore
27 Bytes
Add .gitignore and remove cache files
about 1 month ago
DEPLOYMENT_STATUS.md
2.21 kB
Add deployment status document after re-authentication
about 1 month ago
Dockerfile
1.02 kB
Fix delete_revisions import with fallback cache cleanup
about 1 month ago
FIX_PERMISSIONS.md
2.05 kB
Add permission fix guide for spherical-gate-477614-q7 project
about 1 month ago
LLM_COMPRESSOR_FEATURES.md
6.24 kB
Fix AWQModifier import path: use modifiers.awq instead of modifiers.quantization
about 1 month ago
MANUAL_DEPLOY.md
1.59 kB
Fix delete_revisions import with fallback cache cleanup
about 1 month ago
QUANTIZE_AWQ.md
3.21 kB
Fix AWQModifier import path: use modifiers.awq instead of modifiers.quantization
about 1 month ago
README.md
4.23 kB
Implement vLLM with LLM Compressor and performance optimizations
about 1 month ago
app.py
56.9 kB
Add GPU estimator, DDG search, and cancel support
about 1 month ago
apt.txt
11 Bytes
Initial commit: ZeroGPU LLM Inference Space
about 1 month ago
cloudbuild.yaml
1.36 kB
Add Cloud Build deployment script and permission setup helper
about 1 month ago
deploy-cloud-build.sh
3.31 kB
Add Cloud Build deployment script and permission setup helper
about 1 month ago
deploy-compute-engine.sh
4.23 kB
Add Google Cloud Platform deployment configurations
about 1 month ago
deploy-gcp.sh
2.67 kB
Add Google Cloud Platform deployment configurations
about 1 month ago
gcp-deployment.md
5.32 kB
Add Google Cloud Platform deployment configurations
about 1 month ago
quantize_to_awq_colab.ipynb
32.9 kB
Lower Gemma AWQ group size to 16
about 1 month ago
requirements.txt
397 Bytes
Clarify LLM Compressor optional status - vLLM has native AWQ support
about 1 month ago
setup-gcp-permissions.sh
1.8 kB
Add Cloud Build deployment script and permission setup helper
about 1 month ago
style.css
2.84 kB
Initial commit: ZeroGPU LLM Inference Space
about 1 month ago
test_api.py
3.43 kB
Migrate to AWQ quantization with FlashAttention-2
about 1 month ago
test_api_gradio_client.py
7.2 kB
Implement vLLM with LLM Compressor and performance optimizations
about 1 month ago
test_awq_models.py
3.12 kB
Add test scripts for AWQ models on ZeroGPU Space
about 1 month ago
test_quantization_notebook.py
9.84 kB
Update Qwen model to use AWQ quantized version
about 1 month ago
test_space_awq.sh
1.93 kB
Add test scripts for AWQ models on ZeroGPU Space
about 1 month ago
test_space_simple.py
3.49 kB
Add test scripts for AWQ models on ZeroGPU Space
about 1 month ago
test_space_simple.sh
1.68 kB
Fix delete_revisions import with fallback cache cleanup
about 1 month ago