cover

How LightCap Sees and Speaks: Mobile Magic in Just 188ms Per Image

27 May 2025

LightCap’s 188ms mobile inference, visual concept retrieval, and channel attention visualizations prove efficient, accurate captioning on COCO.

cover

LightCap’s Success on Nocaps: Limitations and Opportunities for Growth

27 May 2025

LightCap excels on Nocaps across domains. Future work includes efficient CLIP, end-to-end training, and more pre-training data for better results.

cover

Not Just Small and Fast, But Smart Too: How LightCap Outperforms on Mobile

26 May 2025

LightCap: 188ms inference time on mobile, 98% FLOPs reduction, outperforms SOTA on COCO and nocaps with tiny model size.

cover

What Makes LightCap Tick? Breaking Down the Numbers and Components

26 May 2025

LightCap, trained on 5.8M image-text pairs, excels on COCO and nocaps using BLEU@4, METEOR, CIDEr, SPICE; ablations show each module’s performance boost.

cover

LightCap Framework: Lightweight Components for Efficient Image Captioning on Edge Devices

26 May 2025

LightCap uses CLIP’s grid features, a visual concept extractor, cross-modal modulator, TinyBERT fusion, and ensemble heads for efficient captioning.

cover

A Survey of Image Captioning Techniques and Vision-Language Pre-training Strategies

26 May 2025

Reviews image captioning (detector-based vs. grid) and VL pre-training (contrastive vs. fusion), positioning LightCap as a novel, efficient CLIP-based approach.

cover

New AI "LightCap" Shrinks Image Captioning for Your Phone, Runs on CPU

26 May 2025

LightCap: a tiny, fast image captioner using CLIP & distillation. 75% smaller, SOTA on COCO (136.6 CIDEr), 188ms/CPU. Ready for mobile!

cover

Human Evaluation Validates MindEye2's Superior Image Reconstruction Quality

17 Apr 2025

This section details two-alternative forced-choice experiments assessing human preference for MindEye2 reconstructions against random reconstructions.

cover

Visualizing Brain Function: MindEye2 Reconstructions from ROI-Specific fMRI

17 Apr 2025

Explore MindEye2's ROI analysis, revealing the preferential stimuli associated with brain regions like V1, Face-ROI, Word-ROI, Place-ROI, and Body-ROI