TenderHub WebAI Verification Worker

A secondary verification layer for tender document processing using the webAI-ColVec1-4b multimodal model. This worker provides an alternative analysis pipeline to cross-validate the primary worker's results.

Architecture Overview

This worker uses a different approach than the primary worker:

Vision-Language Model: webAI-ColVec1-4b for direct document understanding
ZeroGPU Deployment: Leverages HF Spaces ZeroGPU for on-demand GPU access
Memory Optimization: 8-bit quantization + FlashAttention-2 for minimal memory overhead
Verification Logic: Cross-compares results with primary worker

Processing Pipeline

Document Ingestion: Same document retrieval as primary worker
Vision Analysis: Direct image/text processing with webAI-ColVec1-4b
Structured Extraction: Multimodal understanding for tender analysis
Comparison Engine: Cross-validation with primary worker results
Confidence Scoring: Agreement/disagreement metrics

Deployment Strategy

Platform: Hugging Face Spaces with ZeroGPU
Memory Management: 8-bit quantization + CPU fallback
Scaling: On-demand GPU allocation for processing tasks
Cost: Free tier with dynamic GPU provisioning

Key Differences from Primary Worker

Model Architecture: Vision-language vs text-only pipeline
Processing Approach: End-to-end multimodal vs staged extraction
Validation: Cross-model verification vs single-model processing
Memory Strategy: GPU-accelerated vs CPU-optimized

Integration Points

Database: Reads from same processing_jobs table
Storage: Shared Supabase document access
Results: Stores verification metrics and comparisons
API: Compatible job processing interface

Deployment Instructions

1. Create HF Space

# Create new space on Hugging Face
huggingface-cli space create \
  --name tenderhub-webai-verification \
  --space-type gradio \
  --hardware cpu-basic \
  --private

2. Environment Variables

Set these in your HF Space settings:

DATABASE_URL=postgresql://user:pass@host:port/db
NEXT_PUBLIC_SUPABASE_URL=https://your-project.supabase.co
SUPABASE_SERVICE_ROLE_KEY=your-service-role-key
SUPABASE_STORAGE_BUCKET=tender-documents

3. Memory Optimization

The worker automatically applies several OOM prevention strategies:

8-bit Quantization: Reduces 4B model memory from ~8GB to ~4GB with better quality
FlashAttention-2: Optimized attention mechanism with minimal memory overhead
Adaptive DPI: High DPI (200-300) for better extraction with memory-aware scaling
CPU Loading: Model loads on CPU, moves to GPU only during inference
Batch Size 1: Processes one document at a time
Aggressive Memory Cleanup: Manual garbage collection after each document to prevent ghost memory
Image Resizing: Optimized to 336x336 for webAI models

4. Memory Cleanup

Vision tensors can leave 4GB+ of "ghost memory" due to Python's lazy garbage collection. The worker implements aggressive cleanup:

Cleanup Strategy:

GPU Cache Clearing: Multiple passes of torch.cuda.empty_cache()
CUDA Synchronization: Ensures all GPU operations complete before cleanup
Python GC: 3-generation garbage collection with multiple passes
PIL Cache: Clears image processing caches
Memory Monitoring: Tracks memory freed and cleanup effectiveness

Cleanup Triggers:

After every document processing
After WebAI model inference
On processing failures (ensure cleanup even on errors)
Manual cleanup available via aggressive_memory_cleanup()

Monitoring:

# Monitor cleanup effectiveness
grep "memory.cleanup" /var/log/app.log | jq '.memory_freed_gb'

# Track ghost memory prevention
grep "memory_freed_gb" /var/log/app.log | awk '{sum+=$2} END {print "Total freed: " sum "GB"}'

5. DPI Configuration

High DPI (200-300) significantly improves extraction quality for messy documents:

Memory Impact Analysis:

200 DPI: ~~4x larger images (~~1.2MB each)
300 DPI: ~~9x larger images (~~2.7MB each)
Memory Impact: 4-9x increase during processing
Quality Impact: Dramatically better text recognition in complex documents

Adaptive DPI Scaling:

12GB+ Memory: 300 DPI (maximum quality)
8GB+ Memory: 250 DPI (high quality)
4GB+ Memory: 200 DPI (medium quality)
<4GB Memory: 150 DPI (conservative)

Configuration Options:

# Set maximum DPI (default: 200)
PDF_DPI=300

# Enable adaptive scaling (default: true)
ADAPTIVE_DPI=true

6. Database Schema

Add verification tables to your PostgreSQL database:

-- WebAI verification results
CREATE TABLE public.webai_verifications (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    tender_id UUID NOT NULL REFERENCES public.tenders(id),
    analysis JSONB NOT NULL,
    comparison JSONB NOT NULL,
    created_at TIMESTAMP WITH TIME ZONE DEFAULT now(),
    INDEX (tender_id)
);

-- Add verification status to tenders
ALTER TABLE public.tenders 
ADD COLUMN verification_status TEXT DEFAULT 'PENDING',
ADD COLUMN verification_score FLOAT DEFAULT 0.0;

Usage

Automatic Verification

The worker automatically processes verification jobs from the queue:

-- Queue a verification job
INSERT INTO public.processing_jobs (tender_id, job_type, payload)
VALUES ('tender-uuid', 'VERIFY', '{}');

Manual Testing

Use the Gradio interface to test individual documents:

Upload a PDF or image document
Click "Verify Document"
Review the structured analysis output

Verification Results

Access verification results via the database:

-- Get verification for a tender
SELECT 
    tender_id,
    analysis->>'tenderTitle' as title,
    comparison->>'agreement_score' as agreement_score,
    comparison->'recommendation_comparison' as bid_comparison,
    created_at
FROM public.webai_verifications 
WHERE tender_id = 'your-tender-id';

Comparison Metrics

The worker provides detailed comparison metrics:

Agreement Score: 0.0-1.0 overall similarity
Bid Decision Comparison: Primary vs WebAI recommendations
Confidence Comparison: Model confidence differences
Key Differences: Discrepancies requiring human review

Monitoring

Monitor worker performance through structured logs:

# View recent verification logs
grep "webai-verification-worker" /var/log/app.log | tail -20

# Check agreement score distribution
grep "agreement_score" /var/log/app.log | jq '.agreement_score'

Troubleshooting

Common Issues

OOM Errors: Check that 4-bit quantization is enabled
Slow Processing: Verify ZeroGPU is working (check HF Space logs)
Parsing Errors: WebAI responses may need post-processing
Database Connection: Ensure DATABASE_URL is accessible from HF

Performance Tips

Use smaller images when possible
Limit max_new_tokens to reduce memory usage
Monitor GPU allocation in HF Space metrics
Consider upgrading to paid tier for higher throughput

Cost Optimization

Free Tier: ~20-30 documents/hour with 4B model, FlashAttention-2, and adaptive DPI
Paid Tier: Linear scaling with GPU allocation
Batch Processing: Queue multiple jobs for efficiency
Caching: Reuse cached document embeddings when possible
Memory Efficiency: FlashAttention-2 reduces attention memory by ~40%
DPI Impact: High DPI reduces throughput by ~15-25% but dramatically improves quality

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support