The Complete Detection Flow
When you visit a webpage with images, Qwip analyzes each image through a multi-step process that
never uploads your images to our servers. Here's exactly what happens:
1
Image Discovery
The extension uses a MutationObserver to detect when images appear on the page (including lazy-loaded images).
Small images (<128ร128px) and huge images (>4096px) are automatically skipped to save resources.
2
Hash Computation (Local)
Your browser computes 6 cryptographic/perceptual hashes of the image using WebAssembly:
5 perceptual hashes (pHash variants) and 1 BLAKE3 content hash. This happens entirely in your browser.
3
Database Query (Optional)
If enabled, the BLAKE3 hash (just the hash, not the image) is sent to api.qwip.io to check if
this image has been analyzed before by the community. If found, you get instant results from the
community database.
4
Local ML Inference
If not in database, the image is analyzed using ONNX Runtime Web with one of two MobileViT models
(CiFake or GenImage). This runs entirely in your browser using WASM or optionally WebGPU for acceleration.
Processing takes 20-50ms on average.
5
Visual Annotation
AI-generated images get a red border, real images get a subtle green checkmark. The extension also
increments a counter showing how many AI images you've encountered.
6
Community Contribution (Optional)
If enabled, your detection result (hash + confidence + model) is anonymously contributed to the
community database to help future users. Multiple detections are aggregated using weighted averaging.
The ML Models
CiFake Model (Default)
Model Specifications
Architecture: MobileViT
Input Size: 32ร32ร3 RGB
Parameters: ~1 million
Training Data: CIFAKE dataset
Inference Time: ~20ms (WASM) / ~10ms (WebGPU)
Model Size: 4MB
The CiFake model was trained to distinguish real images from AI-generated ones using the CIFAKE dataset.
It's extremely lightweight and fast, but the low input resolution (32ร32) means it can miss fine details.
GenImage Model (Alternative)
Model Specifications
Architecture: MobileViT
Input Size: 64ร64ร3 RGB
Parameters: ~5 million
Training Data: GenImage dataset
Inference Time: ~50ms (WASM) / ~25ms (WebGPU)
Model Size: 18MB
The GenImage model has 4ร higher resolution input and is trained on a different dataset. It may perform
better on certain types of generated images but is slower.
Known Limitation: Both models have relatively low input resolutions (32ร32 and 64ร64) which
can lead to false positives. We're actively researching higher-resolution models with better accuracy.
See our
limitations documentation for full transparency.
Hash-Based Privacy System
Why Hashes Instead of Images?
Instead of uploading your images to our servers (which would be a privacy nightmare), we compute
mathematical "fingerprints" called hashes. These hashes can be used to identify similar images
without ever seeing the actual image content.
The 6 Hashes We Use
- Mean Hash (pHash): Average pixel values in 8ร8 grid
- Gradient Hash: Edge detection-based fingerprint
- Double Gradient Hash: Enhanced gradient with dual passes
- Block Hash: Block-median based hash
- DCT Hash: Discrete Cosine Transform-based
- BLAKE3 Content Hash: Exact cryptographic hash for deduplication
Example Hash Vector
{
"mean": "18379468920823898112",
"gradient": "18015498021093556224",
"doubleGradient": "18374389475892961280",
"block": "18302628773641904128",
"dct": "18374673854875439104",
"blake3": "a7f3d8c9e2b4f1a6..." // 64 hex characters
}
These hashes allow us to detect if you've seen the same (or very similar) image before without
storing or transmitting the actual image. The BLAKE3 hash is used for exact matches, while the
perceptual hashes can detect near-duplicates and edited versions.
Community Database
How It Works
The community database at api.qwip.io stores detection results contributed by users:
- No images stored: Only hashes and metadata
- No user tracking: Contributions are completely anonymous
- Vote aggregation: Multiple detections are combined using weighted averaging
- Open API: Anyone can query and contribute (rate-limited by IP)
Database Schema
Images Table
CREATE TABLE images (
blake3_hash VARCHAR(64) PRIMARY KEY,
hash_mean BIGINT,
hash_gradient BIGINT,
hash_double_gradient BIGINT,
hash_block BIGINT,
hash_dct BIGINT,
likely_ai BOOLEAN,
confidence FLOAT,
vote_count INTEGER,
model_used VARCHAR(50),
first_seen TIMESTAMP,
last_seen TIMESTAMP
);
When multiple users analyze the same image, their confidence scores are aggregated:
Vote Aggregation Algorithm
new_confidence = (old_confidence ร vote_count + new_confidence)
รท (vote_count + 1)
vote_count = vote_count + 1
This simple weighted average ensures that as more people analyze an image, the confidence score
becomes more reliable.
Performance Optimizations
What We Do to Keep Things Fast
- Lazy loading detection: Only processes images when they become visible
- Intelligent caching: Results cached locally to avoid re-processing
- Size filtering: Skips tiny icons and massive images
- WebGPU acceleration: Optional GPU acceleration on Chrome 120+
- Race condition prevention: Atomic checks prevent duplicate processing
- Memory management: Aggressive cleanup of blob URLs and DOM elements
Typical Performance
Processing Times (Average)
Hash computation: 5-10ms
Model inference (CiFake): 20ms WASM / 10ms WebGPU
Model inference (GenImage): 50ms WASM / 25ms WebGPU
Server query: 50-200ms (if not cached)
Total: 25-260ms per image
What Happens Offline?
The extension works completely offline! Here's what happens when you're not connected to the internet:
- โ
Local ML inference still works (no internet needed)
- โ
Images are still analyzed and annotated
- โ
All privacy protections remain active
- โ Can't query community database for cached results
- โ Can't contribute results back to community
When you reconnect, pending contributions are NOT automatically sent (we never queue data without your knowledge).
Want More Technical Details?
Check out our full documentation and open-source code.