
How LightCap Sees and Speaks: Mobile Magic in Just 188ms Per Image
27 May 2025
LightCap’s 188ms mobile inference, visual concept retrieval, and channel attention visualizations prove efficient, accurate captioning on COCO.

LightCap’s Success on Nocaps: Limitations and Opportunities for Growth
27 May 2025
LightCap excels on Nocaps across domains. Future work includes efficient CLIP, end-to-end training, and more pre-training data for better results.

Not Just Small and Fast, But Smart Too: How LightCap Outperforms on Mobile
26 May 2025
LightCap: 188ms inference time on mobile, 98% FLOPs reduction, outperforms SOTA on COCO and nocaps with tiny model size.

What Makes LightCap Tick? Breaking Down the Numbers and Components
26 May 2025
LightCap, trained on 5.8M image-text pairs, excels on COCO and nocaps using BLEU@4, METEOR, CIDEr, SPICE; ablations show each module’s performance boost.

LightCap Framework: Lightweight Components for Efficient Image Captioning on Edge Devices
26 May 2025
LightCap uses CLIP’s grid features, a visual concept extractor, cross-modal modulator, TinyBERT fusion, and ensemble heads for efficient captioning.

A Survey of Image Captioning Techniques and Vision-Language Pre-training Strategies
26 May 2025
Reviews image captioning (detector-based vs. grid) and VL pre-training (contrastive vs. fusion), positioning LightCap as a novel, efficient CLIP-based approach.

New AI "LightCap" Shrinks Image Captioning for Your Phone, Runs on CPU
26 May 2025
LightCap: a tiny, fast image captioner using CLIP & distillation. 75% smaller, SOTA on COCO (136.6 CIDEr), 188ms/CPU. Ready for mobile!

Human Evaluation Validates MindEye2's Superior Image Reconstruction Quality
17 Apr 2025
This section details two-alternative forced-choice experiments assessing human preference for MindEye2 reconstructions against random reconstructions.

Visualizing Brain Function: MindEye2 Reconstructions from ROI-Specific fMRI
17 Apr 2025
Explore MindEye2's ROI analysis, revealing the preferential stimuli associated with brain regions like V1, Face-ROI, Word-ROI, Place-ROI, and Body-ROI