---
language: en
license: mit
library_name: pytorch
pipeline_tag: image-classification
base_model:
  - timm/vit_small_patch16_224.augreg_in21k_ft_in1k
datasets:
  - uoft-cs/cifar100
metrics:
  - accuracy
tags:
  - image-classification
  - vision-transformer
  - lora
  - peft
  - cifar100
  - pytorch
model-index:
  - name: vit-lora-cifar100
    results:
      - task:
          type: image-classification
          name: Image Classification
        dataset:
          type: uoft-cs/cifar100
          name: CIFAR-100
        metrics:
          - type: accuracy
            name: Top-1 Accuracy
            value: 89.74
---

# ViT-S + LoRA for CIFAR-100

This model is a Vision Transformer Small (`vit_small_patch16_224`) adapted with LoRA for 100-class image classification on CIFAR-100.

It is intended for users who want to download a ready-to-use classifier and run inference.

## Model At A Glance

- Architecture: ViT-Small (`vit_small_patch16_224`)
- Adaptation: LoRA (PEFT)
- Task: image classification
- Labels: 100 classes
- Dataset: CIFAR-100
- Best configuration: `r=8`, `alpha=8`, `dropout=0.1`

## Model Structure After Fine-Tuning

The uploaded weights are for a ViT-S model with LoRA adapters and a trainable classification head.

1. Backbone: `vit_small_patch16_224` (pretrained ViT-S)
2. LoRA adapters: injected into fused attention `qkv` projection layers
3. Classification head: output dimension updated to 100 classes
4. Parameter-efficient setup: base backbone mostly frozen, LoRA + head trained

### Fine-tuned architecture summary

| Component | Status After Fine-tuning |
|---|---|
| Patch embedding + transformer backbone | Frozen (except LoRA pathways) |
| Attention `qkv` projection | LoRA adapters active |
| MLP blocks | Frozen |
| Classification head | Trainable |
| Output classes | 100 (CIFAR-100) |

The uploaded `model.pt` is a state dict that contains both base model parameters and LoRA parameters.

## Performance

| Metric | Value |
|---|---:|
| Top-1 Accuracy | 89.74% |
| Best Validation Accuracy | 89.14% |
| Trainable Parameters | 185,956 |

## Recommended Use Cases

- Direct image classification inference with a ViT+LoRA model
- Educational demos for Hugging Face model download and inference
- Baseline testing on CIFAR-100-style natural images

## Out-of-Scope Use Cases

- Safety-critical decision making
- Medical/legal/financial decisions
- Deployment to domains very different from CIFAR-style natural images without additional validation

## Limitations

- Evaluated only on CIFAR-100
- Accuracy may drop on real-world or shifted data distributions
- Not designed as a robustness- or fairness-optimized model

## Bias, Risks, and Responsible Use

- Class imbalance and dataset artifacts may influence predictions
- Misclassifications are expected for ambiguous or out-of-distribution inputs
- Always include human review before real-world decisions

## How To Use 

This model repo stores:

- `model.pt` (PyTorch state dict)
- `config.json` (model + LoRA settings)

To use the model, you must recreate the same architecture and then load `model.pt`.

### Install dependencies 

```bash
pip install torch timm peft huggingface_hub torchvision pillow
```

### Download from Hugging Face and load 

```python
import torch
import timm
from peft import LoraConfig, get_peft_model
from huggingface_hub import hf_hub_download


def build_model(num_classes=100, rank=8, alpha=8, lora_dropout=0.1):
  # 1) Base ViT-S
  model = timm.create_model("vit_small_patch16_224", pretrained=False, num_classes=num_classes)

  # 2) Freeze base params
  for p in model.parameters():
    p.requires_grad = False

  # 3) Keep head trainable
  for p in model.head.parameters():
    p.requires_grad = True

  # 4) Attach LoRA to fused qkv projection
  lora_cfg = LoraConfig(
    r=rank,
    lora_alpha=alpha,
    lora_dropout=lora_dropout,
    target_modules=["qkv"],
    bias="none",
    modules_to_save=["head"],
  )
  model = get_peft_model(model, lora_cfg)
  return model

repo_id = "priyadip/vit-lora-cifar100"
weights_path = hf_hub_download(repo_id=repo_id, filename="model.pt")

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = build_model(
  num_classes=100,
  rank=8,
  alpha=8,
  lora_dropout=0.1,
).to(device)

state_dict = torch.load(weights_path, map_location=device)
model.load_state_dict(state_dict)
model.eval()
```

### Minimal inference example

```python
import torch
from PIL import Image
import torchvision.transforms as transforms

# Use the same normalization expected by this model
transform = transforms.Compose([
  transforms.Resize(224),
  transforms.ToTensor(),
  transforms.Normalize(mean=[0.5071, 0.4867, 0.4408],
             std=[0.2675, 0.2565, 0.2761]),
])

img = Image.open("your_image.png").convert("RGB")
x = transform(img).unsqueeze(0).to(next(model.parameters()).device)

with torch.no_grad():
  logits = model(x)
  pred = logits.argmax(dim=1).item()

print("Predicted class index:", pred)
```

Note: the output is a CIFAR-100 class index (0-99). Map indices to class names according to your CIFAR-100 label list.

If needed, fetch config directly from the repo:

```python
from huggingface_hub import hf_hub_download
import json

cfg_path = hf_hub_download(repo_id="priyadip/vit-lora-cifar100", filename="config.json")
cfg = json.load(open(cfg_path))
print(cfg)
```

## Visuals

![Training Curves](assets/LoRA_r8_a8_d0.1_curves.png)

![Class-wise Accuracy](assets/LoRA_r8_a8_d0.1_classwise.png)

![LoRA Gradient Norms](assets/LoRA_r8_a8_d0.1_gradients.png)

### What We Can Observe From These Visuals

- **Training curves:** The train/validation curves improve smoothly and stay close, which suggests stable optimization and good generalization for this setup.
- **Class-wise accuracy:** Performance is strong across most CIFAR-100 classes, indicating that adaptation is not concentrated on only a few easy categories.
- **LoRA gradient norms:** Gradients remain active and controlled during training, showing that LoRA adapters receive useful updates without unstable spikes.

### Why LoRA ?

LoRA adds small trainable low-rank adapters inside attention layers instead of updating the full backbone. This gives three practical advantages:

1. **Parameter efficiency:** far fewer trainable parameters than full fine-tuning.
2. **Stable transfer:** pretrained ViT knowledge is largely preserved while adapters specialize to CIFAR-100.
3. **Better compute/memory trade-off:** strong accuracy with lighter optimization cost.

These plots support that LoRA provides an effective balance between accuracy and training efficiency for this model.

## Framework Versions

- PyTorch
- timm
- PEFT (LoRA)

## Citation

- ViT paper (original architecture): https://arxiv.org/abs/2010.11929
- timm library (implementation used): https://github.com/huggingface/pytorch-image-models
- timm ViT-S model entry (reference checkpoint page): https://huggingface.co/timm/vit_small_patch16_224.augreg_in21k_ft_in1k

```bibtex
@inproceedings{dosovitskiy2021vit,
  title={An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale},
  author={Dosovitskiy, Alexey and Beyer, Lucas and Kolesnikov, Alexander and Weissenborn, Dirk and Zhai, Xiaohua and Unterthiner, Thomas and Dehghani, Mostafa and Minderer, Matthias and Heigold, Georg and Gelly, Sylvain and Uszkoreit, Jakob and Houlsby, Neil},
  booktitle={International Conference on Learning Representations (ICLR)},
  year={2021}
}
```
```bibtex
@inproceedings{hu2022lora,
  title={LoRA: Low-Rank Adaptation of Large Language Models},
  author={Hu, Edward J. and Shen, Yelong and Wallis, Phillip and Allen-Zhu, Zeyuan and Li, Yuanzhi and Wang, Shean and Wang, Lu and Chen, Weizhu},
  booktitle={International Conference on Learning Representations (ICLR)},
  year={2022}
}
```
```bibtex
@misc{mangrulkar2022peft,
  title={PEFT: State-of-the-art Parameter-Efficient Fine-Tuning methods},
  author={Mangrulkar, Sourab and Paul, Sayak and others},
  year={2022},
  howpublished={\url{https://github.com/huggingface/peft}}
}
```
```bibtex
@misc{rw2019timm,
  title={PyTorch Image Models},
  author={Wightman, Ross},
  year={2019},
  howpublished={\url{https://github.com/huggingface/pytorch-image-models}}
}
```