TenderHub WebAI Verification Worker
A secondary verification layer for tender document processing using the webAI-ColVec1-4b multimodal model. This worker provides an alternative analysis pipeline to cross-validate the primary worker's results.
Architecture Overview
This worker uses a different approach than the primary worker:
- Vision-Language Model: webAI-ColVec1-4b for direct document understanding
- ZeroGPU Deployment: Leverages HF Spaces ZeroGPU for on-demand GPU access
- Memory Optimization: 8-bit quantization + FlashAttention-2 for minimal memory overhead
- Verification Logic: Cross-compares results with primary worker
Processing Pipeline
- Document Ingestion: Same document retrieval as primary worker
- Vision Analysis: Direct image/text processing with webAI-ColVec1-4b
- Structured Extraction: Multimodal understanding for tender analysis
- Comparison Engine: Cross-validation with primary worker results
- Confidence Scoring: Agreement/disagreement metrics
Deployment Strategy
- Platform: Hugging Face Spaces with ZeroGPU
- Memory Management: 8-bit quantization + CPU fallback
- Scaling: On-demand GPU allocation for processing tasks
- Cost: Free tier with dynamic GPU provisioning
Key Differences from Primary Worker
- Model Architecture: Vision-language vs text-only pipeline
- Processing Approach: End-to-end multimodal vs staged extraction
- Validation: Cross-model verification vs single-model processing
- Memory Strategy: GPU-accelerated vs CPU-optimized
Integration Points
- Database: Reads from same processing_jobs table
- Storage: Shared Supabase document access
- Results: Stores verification metrics and comparisons
- API: Compatible job processing interface
Deployment Instructions
1. Create HF Space
# Create new space on Hugging Face
huggingface-cli space create \
--name tenderhub-webai-verification \
--space-type gradio \
--hardware cpu-basic \
--private
2. Environment Variables
Set these in your HF Space settings:
DATABASE_URL=postgresql://user:pass@host:port/db
NEXT_PUBLIC_SUPABASE_URL=https://your-project.supabase.co
SUPABASE_SERVICE_ROLE_KEY=your-service-role-key
SUPABASE_STORAGE_BUCKET=tender-documents
3. Memory Optimization
The worker automatically applies several OOM prevention strategies:
- 8-bit Quantization: Reduces 4B model memory from ~8GB to ~4GB with better quality
- FlashAttention-2: Optimized attention mechanism with minimal memory overhead
- Adaptive DPI: High DPI (200-300) for better extraction with memory-aware scaling
- CPU Loading: Model loads on CPU, moves to GPU only during inference
- Batch Size 1: Processes one document at a time
- Aggressive Memory Cleanup: Manual garbage collection after each document to prevent ghost memory
- Image Resizing: Optimized to 336x336 for webAI models
4. Memory Cleanup
Vision tensors can leave 4GB+ of "ghost memory" due to Python's lazy garbage collection. The worker implements aggressive cleanup:
Cleanup Strategy:
- GPU Cache Clearing: Multiple passes of
torch.cuda.empty_cache() - CUDA Synchronization: Ensures all GPU operations complete before cleanup
- Python GC: 3-generation garbage collection with multiple passes
- PIL Cache: Clears image processing caches
- Memory Monitoring: Tracks memory freed and cleanup effectiveness
Cleanup Triggers:
- After every document processing
- After WebAI model inference
- On processing failures (ensure cleanup even on errors)
- Manual cleanup available via
aggressive_memory_cleanup()
Monitoring:
# Monitor cleanup effectiveness
grep "memory.cleanup" /var/log/app.log | jq '.memory_freed_gb'
# Track ghost memory prevention
grep "memory_freed_gb" /var/log/app.log | awk '{sum+=$2} END {print "Total freed: " sum "GB"}'
5. DPI Configuration
High DPI (200-300) significantly improves extraction quality for messy documents:
Memory Impact Analysis:
- 200 DPI:
4x larger images (1.2MB each) - 300 DPI:
9x larger images (2.7MB each) - Memory Impact: 4-9x increase during processing
- Quality Impact: Dramatically better text recognition in complex documents
Adaptive DPI Scaling:
- 12GB+ Memory: 300 DPI (maximum quality)
- 8GB+ Memory: 250 DPI (high quality)
- 4GB+ Memory: 200 DPI (medium quality)
- <4GB Memory: 150 DPI (conservative)
Configuration Options:
# Set maximum DPI (default: 200)
PDF_DPI=300
# Enable adaptive scaling (default: true)
ADAPTIVE_DPI=true
6. Database Schema
Add verification tables to your PostgreSQL database:
-- WebAI verification results
CREATE TABLE public.webai_verifications (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
tender_id UUID NOT NULL REFERENCES public.tenders(id),
analysis JSONB NOT NULL,
comparison JSONB NOT NULL,
created_at TIMESTAMP WITH TIME ZONE DEFAULT now(),
INDEX (tender_id)
);
-- Add verification status to tenders
ALTER TABLE public.tenders
ADD COLUMN verification_status TEXT DEFAULT 'PENDING',
ADD COLUMN verification_score FLOAT DEFAULT 0.0;
Usage
Automatic Verification
The worker automatically processes verification jobs from the queue:
-- Queue a verification job
INSERT INTO public.processing_jobs (tender_id, job_type, payload)
VALUES ('tender-uuid', 'VERIFY', '{}');
Manual Testing
Use the Gradio interface to test individual documents:
- Upload a PDF or image document
- Click "Verify Document"
- Review the structured analysis output
Verification Results
Access verification results via the database:
-- Get verification for a tender
SELECT
tender_id,
analysis->>'tenderTitle' as title,
comparison->>'agreement_score' as agreement_score,
comparison->'recommendation_comparison' as bid_comparison,
created_at
FROM public.webai_verifications
WHERE tender_id = 'your-tender-id';
Comparison Metrics
The worker provides detailed comparison metrics:
- Agreement Score: 0.0-1.0 overall similarity
- Bid Decision Comparison: Primary vs WebAI recommendations
- Confidence Comparison: Model confidence differences
- Key Differences: Discrepancies requiring human review
Monitoring
Monitor worker performance through structured logs:
# View recent verification logs
grep "webai-verification-worker" /var/log/app.log | tail -20
# Check agreement score distribution
grep "agreement_score" /var/log/app.log | jq '.agreement_score'
Troubleshooting
Common Issues
- OOM Errors: Check that 4-bit quantization is enabled
- Slow Processing: Verify ZeroGPU is working (check HF Space logs)
- Parsing Errors: WebAI responses may need post-processing
- Database Connection: Ensure DATABASE_URL is accessible from HF
Performance Tips
- Use smaller images when possible
- Limit
max_new_tokensto reduce memory usage - Monitor GPU allocation in HF Space metrics
- Consider upgrading to paid tier for higher throughput
Cost Optimization
- Free Tier: ~20-30 documents/hour with 4B model, FlashAttention-2, and adaptive DPI
- Paid Tier: Linear scaling with GPU allocation
- Batch Processing: Queue multiple jobs for efficiency
- Caching: Reuse cached document embeddings when possible
- Memory Efficiency: FlashAttention-2 reduces attention memory by ~40%
- DPI Impact: High DPI reduces throughput by ~15-25% but dramatically improves quality