TenderHub WebAI Verification Worker

A secondary verification layer for tender document processing using the webAI-ColVec1-4b multimodal model. This worker provides an alternative analysis pipeline to cross-validate the primary worker's results.

Architecture Overview

This worker uses a different approach than the primary worker:

  • Vision-Language Model: webAI-ColVec1-4b for direct document understanding
  • ZeroGPU Deployment: Leverages HF Spaces ZeroGPU for on-demand GPU access
  • Memory Optimization: 8-bit quantization + FlashAttention-2 for minimal memory overhead
  • Verification Logic: Cross-compares results with primary worker

Processing Pipeline

  1. Document Ingestion: Same document retrieval as primary worker
  2. Vision Analysis: Direct image/text processing with webAI-ColVec1-4b
  3. Structured Extraction: Multimodal understanding for tender analysis
  4. Comparison Engine: Cross-validation with primary worker results
  5. Confidence Scoring: Agreement/disagreement metrics

Deployment Strategy

  • Platform: Hugging Face Spaces with ZeroGPU
  • Memory Management: 8-bit quantization + CPU fallback
  • Scaling: On-demand GPU allocation for processing tasks
  • Cost: Free tier with dynamic GPU provisioning

Key Differences from Primary Worker

  • Model Architecture: Vision-language vs text-only pipeline
  • Processing Approach: End-to-end multimodal vs staged extraction
  • Validation: Cross-model verification vs single-model processing
  • Memory Strategy: GPU-accelerated vs CPU-optimized

Integration Points

  • Database: Reads from same processing_jobs table
  • Storage: Shared Supabase document access
  • Results: Stores verification metrics and comparisons
  • API: Compatible job processing interface

Deployment Instructions

1. Create HF Space

# Create new space on Hugging Face
huggingface-cli space create \
  --name tenderhub-webai-verification \
  --space-type gradio \
  --hardware cpu-basic \
  --private

2. Environment Variables

Set these in your HF Space settings:

DATABASE_URL=postgresql://user:pass@host:port/db
NEXT_PUBLIC_SUPABASE_URL=https://your-project.supabase.co
SUPABASE_SERVICE_ROLE_KEY=your-service-role-key
SUPABASE_STORAGE_BUCKET=tender-documents

3. Memory Optimization

The worker automatically applies several OOM prevention strategies:

  • 8-bit Quantization: Reduces 4B model memory from ~8GB to ~4GB with better quality
  • FlashAttention-2: Optimized attention mechanism with minimal memory overhead
  • Adaptive DPI: High DPI (200-300) for better extraction with memory-aware scaling
  • CPU Loading: Model loads on CPU, moves to GPU only during inference
  • Batch Size 1: Processes one document at a time
  • Aggressive Memory Cleanup: Manual garbage collection after each document to prevent ghost memory
  • Image Resizing: Optimized to 336x336 for webAI models

4. Memory Cleanup

Vision tensors can leave 4GB+ of "ghost memory" due to Python's lazy garbage collection. The worker implements aggressive cleanup:

Cleanup Strategy:

  • GPU Cache Clearing: Multiple passes of torch.cuda.empty_cache()
  • CUDA Synchronization: Ensures all GPU operations complete before cleanup
  • Python GC: 3-generation garbage collection with multiple passes
  • PIL Cache: Clears image processing caches
  • Memory Monitoring: Tracks memory freed and cleanup effectiveness

Cleanup Triggers:

  • After every document processing
  • After WebAI model inference
  • On processing failures (ensure cleanup even on errors)
  • Manual cleanup available via aggressive_memory_cleanup()

Monitoring:

# Monitor cleanup effectiveness
grep "memory.cleanup" /var/log/app.log | jq '.memory_freed_gb'

# Track ghost memory prevention
grep "memory_freed_gb" /var/log/app.log | awk '{sum+=$2} END {print "Total freed: " sum "GB"}'

5. DPI Configuration

High DPI (200-300) significantly improves extraction quality for messy documents:

Memory Impact Analysis:

  • 200 DPI: 4x larger images (1.2MB each)
  • 300 DPI: 9x larger images (2.7MB each)
  • Memory Impact: 4-9x increase during processing
  • Quality Impact: Dramatically better text recognition in complex documents

Adaptive DPI Scaling:

  • 12GB+ Memory: 300 DPI (maximum quality)
  • 8GB+ Memory: 250 DPI (high quality)
  • 4GB+ Memory: 200 DPI (medium quality)
  • <4GB Memory: 150 DPI (conservative)

Configuration Options:

# Set maximum DPI (default: 200)
PDF_DPI=300

# Enable adaptive scaling (default: true)
ADAPTIVE_DPI=true

6. Database Schema

Add verification tables to your PostgreSQL database:

-- WebAI verification results
CREATE TABLE public.webai_verifications (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    tender_id UUID NOT NULL REFERENCES public.tenders(id),
    analysis JSONB NOT NULL,
    comparison JSONB NOT NULL,
    created_at TIMESTAMP WITH TIME ZONE DEFAULT now(),
    INDEX (tender_id)
);

-- Add verification status to tenders
ALTER TABLE public.tenders 
ADD COLUMN verification_status TEXT DEFAULT 'PENDING',
ADD COLUMN verification_score FLOAT DEFAULT 0.0;

Usage

Automatic Verification

The worker automatically processes verification jobs from the queue:

-- Queue a verification job
INSERT INTO public.processing_jobs (tender_id, job_type, payload)
VALUES ('tender-uuid', 'VERIFY', '{}');

Manual Testing

Use the Gradio interface to test individual documents:

  1. Upload a PDF or image document
  2. Click "Verify Document"
  3. Review the structured analysis output

Verification Results

Access verification results via the database:

-- Get verification for a tender
SELECT 
    tender_id,
    analysis->>'tenderTitle' as title,
    comparison->>'agreement_score' as agreement_score,
    comparison->'recommendation_comparison' as bid_comparison,
    created_at
FROM public.webai_verifications 
WHERE tender_id = 'your-tender-id';

Comparison Metrics

The worker provides detailed comparison metrics:

  • Agreement Score: 0.0-1.0 overall similarity
  • Bid Decision Comparison: Primary vs WebAI recommendations
  • Confidence Comparison: Model confidence differences
  • Key Differences: Discrepancies requiring human review

Monitoring

Monitor worker performance through structured logs:

# View recent verification logs
grep "webai-verification-worker" /var/log/app.log | tail -20

# Check agreement score distribution
grep "agreement_score" /var/log/app.log | jq '.agreement_score'

Troubleshooting

Common Issues

  1. OOM Errors: Check that 4-bit quantization is enabled
  2. Slow Processing: Verify ZeroGPU is working (check HF Space logs)
  3. Parsing Errors: WebAI responses may need post-processing
  4. Database Connection: Ensure DATABASE_URL is accessible from HF

Performance Tips

  • Use smaller images when possible
  • Limit max_new_tokens to reduce memory usage
  • Monitor GPU allocation in HF Space metrics
  • Consider upgrading to paid tier for higher throughput

Cost Optimization

  • Free Tier: ~20-30 documents/hour with 4B model, FlashAttention-2, and adaptive DPI
  • Paid Tier: Linear scaling with GPU allocation
  • Batch Processing: Queue multiple jobs for efficiency
  • Caching: Reuse cached document embeddings when possible
  • Memory Efficiency: FlashAttention-2 reduces attention memory by ~40%
  • DPI Impact: High DPI reduces throughput by ~15-25% but dramatically improves quality
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support