The Complete Detection Flow

When you visit a webpage with images, Qwip analyzes each image through a multi-step process that never uploads your images to our servers. Here's exactly what happens:

1

Image Discovery

The extension uses a MutationObserver to detect when images appear on the page (including lazy-loaded images). Small images (<128ร—128px) and huge images (>4096px) are automatically skipped to save resources.

2

Hash Computation (Local)

Your browser computes 6 cryptographic/perceptual hashes of the image using WebAssembly: 5 perceptual hashes (pHash variants) and 1 BLAKE3 content hash. This happens entirely in your browser.

3

Database Query (Optional)

If enabled, the BLAKE3 hash (just the hash, not the image) is sent to api.qwip.io to check if this image has been analyzed before by the community. If found, you get instant results from the community database.

4

Local ML Inference

If not in database, the image is analyzed using ONNX Runtime Web with one of two MobileViT models (CiFake or GenImage). This runs entirely in your browser using WASM or optionally WebGPU for acceleration. Processing takes 20-50ms on average.

5

Visual Annotation

AI-generated images get a red border, real images get a subtle green checkmark. The extension also increments a counter showing how many AI images you've encountered.

6

Community Contribution (Optional)

If enabled, your detection result (hash + confidence + model) is anonymously contributed to the community database to help future users. Multiple detections are aggregated using weighted averaging.

The ML Models

CiFake Model (Default)

Model Specifications

Architecture: MobileViT
Input Size: 32ร—32ร—3 RGB
Parameters: ~1 million
Training Data: CIFAKE dataset
Inference Time: ~20ms (WASM) / ~10ms (WebGPU)
Model Size: 4MB

The CiFake model was trained to distinguish real images from AI-generated ones using the CIFAKE dataset. It's extremely lightweight and fast, but the low input resolution (32ร—32) means it can miss fine details.

GenImage Model (Alternative)

Model Specifications

Architecture: MobileViT
Input Size: 64ร—64ร—3 RGB
Parameters: ~5 million
Training Data: GenImage dataset
Inference Time: ~50ms (WASM) / ~25ms (WebGPU)
Model Size: 18MB

The GenImage model has 4ร— higher resolution input and is trained on a different dataset. It may perform better on certain types of generated images but is slower.

Known Limitation: Both models have relatively low input resolutions (32ร—32 and 64ร—64) which can lead to false positives. We're actively researching higher-resolution models with better accuracy. See our limitations documentation for full transparency.

Hash-Based Privacy System

Why Hashes Instead of Images?

Instead of uploading your images to our servers (which would be a privacy nightmare), we compute mathematical "fingerprints" called hashes. These hashes can be used to identify similar images without ever seeing the actual image content.

The 6 Hashes We Use

  1. Mean Hash (pHash): Average pixel values in 8ร—8 grid
  2. Gradient Hash: Edge detection-based fingerprint
  3. Double Gradient Hash: Enhanced gradient with dual passes
  4. Block Hash: Block-median based hash
  5. DCT Hash: Discrete Cosine Transform-based
  6. BLAKE3 Content Hash: Exact cryptographic hash for deduplication

Example Hash Vector

{
  "mean": "18379468920823898112",
  "gradient": "18015498021093556224",
  "doubleGradient": "18374389475892961280",
  "block": "18302628773641904128",
  "dct": "18374673854875439104",
  "blake3": "a7f3d8c9e2b4f1a6..." // 64 hex characters
}

These hashes allow us to detect if you've seen the same (or very similar) image before without storing or transmitting the actual image. The BLAKE3 hash is used for exact matches, while the perceptual hashes can detect near-duplicates and edited versions.

Community Database

How It Works

The community database at api.qwip.io stores detection results contributed by users:

Database Schema

Images Table

CREATE TABLE images (
  blake3_hash VARCHAR(64) PRIMARY KEY,
  hash_mean BIGINT,
  hash_gradient BIGINT,
  hash_double_gradient BIGINT,
  hash_block BIGINT,
  hash_dct BIGINT,
  likely_ai BOOLEAN,
  confidence FLOAT,
  vote_count INTEGER,
  model_used VARCHAR(50),
  first_seen TIMESTAMP,
  last_seen TIMESTAMP
);

When multiple users analyze the same image, their confidence scores are aggregated:

Vote Aggregation Algorithm

new_confidence = (old_confidence ร— vote_count + new_confidence)
                 รท (vote_count + 1)

vote_count = vote_count + 1

This simple weighted average ensures that as more people analyze an image, the confidence score becomes more reliable.

Performance Optimizations

What We Do to Keep Things Fast

Typical Performance

Processing Times (Average)

Hash computation: 5-10ms
Model inference (CiFake): 20ms WASM / 10ms WebGPU
Model inference (GenImage): 50ms WASM / 25ms WebGPU
Server query: 50-200ms (if not cached)

Total: 25-260ms per image

What Happens Offline?

The extension works completely offline! Here's what happens when you're not connected to the internet:

When you reconnect, pending contributions are NOT automatically sent (we never queue data without your knowledge).

Want More Technical Details?

Check out our full documentation and open-source code.

Full Documentation โ†’ API Reference โ†’ View Source Code โ†’