How It Works

The Complete Detection Flow

When you visit a webpage with images, Qwip analyzes each image through a multi-step process that never uploads your images to our servers. Here's exactly what happens:

1

Image Discovery

The extension uses a MutationObserver to detect when images appear on the page (including lazy-loaded images). Small images (<128×128px) and huge images (>4096px) are automatically skipped to save resources.

2

Hash Computation (Local)

Your browser computes 6 cryptographic/perceptual hashes of the image using WebAssembly: 5 perceptual hashes (pHash variants) and 1 BLAKE3 content hash. This happens entirely in your browser.

3

Database Query (Optional)

If enabled, the BLAKE3 hash (just the hash, not the image) is sent to api.qwip.io to check if this image has been analyzed before by the community. If found, you get instant results from the community database.

4

Color Pre-filter

Before running the ML model, a color-entropy analysis checks whether the image looks photorealistic. Illustrations, cartoons, icons, and heavily-edited artwork are automatically skipped — reducing false positives and avoiding wasted compute on non-photographic content.

5

Local ML Inference

If not in the database and not filtered by the pre-filter, the image is analyzed using ONNX Runtime Web. The default model is MobileCLIP (256×256 input), with MobileNetV2 and Swin Transformer also available. Inference runs entirely in your browser via WASM or WebGPU — no image data is transmitted.

6

Visual Annotation

AI-generated images get a red border, real images get a subtle green checkmark. The extension also increments a counter showing how many AI images you've encountered.

7

Community Contribution (Optional)

If enabled, your detection result (hash + confidence + model) is anonymously contributed to the community database to help future users. Multiple detections are aggregated using weighted averaging.

The ML Models

Three models are available, selectable in the extension popup. All run entirely in your browser.

MobileCLIP Fake Detector (Default)

Model Specifications

Architecture: MobileCLIP (CLIP-based vision encoder)
Input Size: 256×256×3 RGB
Normalization: CLIP-style (mean/std per channel)
Output: Single probability (AI likelihood)
Inference Time: ~40ms (WASM) / ~15ms (WebGPU)
Model Size: 43MB

The default model uses a CLIP-based architecture fine-tuned to detect AI-generated images. Its higher input resolution (256×256) and semantic understanding make it the most robust option for modern AI generators.

MobileNetV2 (Fast)

Model Specifications

Architecture: MobileNetV2
Input Size: 224×224×3 RGB
Normalization: ImageNet standard
Output: [AI, Real] class probabilities
Inference Time: ~20ms (WASM) / ~8ms (WebGPU)
Model Size: ~9MB

A lightweight, fast option best suited for lower-powered devices or when you want lower latency. Slightly less accurate than MobileCLIP on newer generators.

Swin Transformer (High Accuracy)

Model Specifications

Architecture: Swin Transformer
Input Size: 224×224×3 RGB
Normalization: ImageNet standard
Output: [AI, Real] class probabilities
Inference Time: ~200ms (WASM) / ~60ms (WebGPU)
Model Size: 91MB

The most accurate model in the lineup. Recommended for cases where precision matters more than speed. WebGPU acceleration is strongly recommended — WASM is noticeably slower.

Known Limitation: All models can produce false positives on heavily-edited real images (vibrant photos, YouTube thumbnails, HDR shots). We're actively expanding training data with pre-2020 edited photography to reduce these cases.

Hash-Based Privacy System

Why Hashes Instead of Images?

Instead of uploading your images to our servers (which would be a privacy nightmare), we compute mathematical "fingerprints" called hashes. These hashes can be used to identify similar images without ever seeing the actual image content.

The 6 Hashes We Use

Mean Hash (pHash): Average pixel values in 8×8 grid
Gradient Hash: Edge detection-based fingerprint
Double Gradient Hash: Enhanced gradient with dual passes
Block Hash: Block-median based hash
DCT Hash: Discrete Cosine Transform-based
BLAKE3 Content Hash: Exact cryptographic hash for deduplication

Example Hash Vector

{
  "mean": "18379468920823898112",
  "gradient": "18015498021093556224",
  "doubleGradient": "18374389475892961280",
  "block": "18302628773641904128",
  "dct": "18374673854875439104",
  "blake3": "a7f3d8c9e2b4f1a6..." // 64 hex characters
}

These hashes allow us to detect if you've seen the same (or very similar) image before without storing or transmitting the actual image. The BLAKE3 hash is used for exact matches, while the perceptual hashes can detect near-duplicates and edited versions.

Community Database

The community database at api.qwip.io stores detection results contributed by users:

No images stored: Only hashes and metadata
No user tracking: Contributions are completely anonymous
Vote aggregation: Multiple detections are combined using weighted averaging
Open API: Anyone can query and contribute (rate-limited by IP)

Database Schema

Images Table

CREATE TABLE images (
  blake3_hash VARCHAR(64) PRIMARY KEY,
  hash_mean BIGINT,
  hash_gradient BIGINT,
  hash_double_gradient BIGINT,
  hash_block BIGINT,
  hash_dct BIGINT,
  likely_ai BOOLEAN,
  confidence FLOAT,
  vote_count INTEGER,
  model_used VARCHAR(50),
  first_seen TIMESTAMP,
  last_seen TIMESTAMP
);

When multiple users analyze the same image, their confidence scores are aggregated:

Vote Aggregation Algorithm

new_confidence = (old_confidence × vote_count + new_confidence)
÷ (vote_count + 1)

vote_count = vote_count + 1

This simple weighted average ensures that as more people analyze an image, the confidence score becomes more reliable.

Performance Optimizations

What We Do to Keep Things Fast

Lazy loading detection: Only processes images when they become visible
Intelligent caching: Results cached locally to avoid re-processing
Size filtering: Skips tiny icons and massive images
WebGPU acceleration: Optional GPU acceleration on Chrome 120+
Race condition prevention: Atomic checks prevent duplicate processing
Memory management: Aggressive cleanup of blob URLs and DOM elements

Typical Performance

Processing Times (Average)

Color pre-filter: ~5ms
Hash computation: 5-10ms
Model inference (MobileCLIP, default): ~40ms WASM / ~15ms WebGPU
Model inference (MobileNetV2, fast): ~20ms WASM / ~8ms WebGPU
Model inference (Swin Transformer, accurate): ~200ms WASM / ~60ms WebGPU
Server query: 50-200ms (if not cached)

Total (typical, MobileCLIP + server): 50-260ms per image

What Happens Offline?

The extension works completely offline! Here's what happens when you're not connected to the internet:

✅ Local ML inference still works (no internet needed)
✅ Images are still analyzed and annotated
✅ All privacy protections remain active
❌ Can't query community database for cached results
❌ Can't contribute results back to community

When you reconnect, pending contributions are NOT automatically sent (we never queue data without your knowledge).

How It Works

The Complete Detection Flow

Image Discovery

Hash Computation (Local)

Database Query (Optional)

Color Pre-filter

Local ML Inference

Visual Annotation

Community Contribution (Optional)

The ML Models

MobileCLIP Fake Detector (Default)

Model Specifications

MobileNetV2 (Fast)

Model Specifications

Swin Transformer (High Accuracy)

Model Specifications

Hash-Based Privacy System

Why Hashes Instead of Images?

The 6 Hashes We Use

Example Hash Vector

Community Database

How It Works

Database Schema

Images Table

Vote Aggregation Algorithm

Performance Optimizations

What We Do to Keep Things Fast

Typical Performance

Processing Times (Average)

What Happens Offline?

Want More Technical Details?