Behind the Scenes: The Making of MindEye2

cover
15 Apr 2025

Abstract and 1 Introduction

2 MindEye2 and 2.1 Shared-Subject Functional Alignment

2.2 Backbone, Diffusion Prior, & Submodules

2.3 Image Captioning and 2.4 Fine-tuning Stable Diffusion XL for unCLIP

2.5 Model Inference

3 Results and 3.1 fMRI-to-Image Reconstruction

3.2 Image Captioning

3.3 Image/Brain Retrieval and 3.4 Brain Correlation

3.5 Ablations

4 Related Work

5 Conclusion

6 Acknowledgements and References

A Appendix

A.1 Author Contributions

A.2 Additional Dataset Information

A.3 MindEye2 (not pretrained) vs. MindEye1

A.4 Reconstruction Evaluations Across Varying Amounts of Training Data

A.5 Single-Subject Evaluations

A.6 UnCLIP Evaluation

A.7 OpenCLIP BigG to CLIP L Conversion

A.8 COCO Retrieval

A.9 Reconstruction Evaluations: Additional Information

A.10 Pretraining with Less Subjects

A.11 UMAP Dimensionality Reduction

A.12 ROI-Optimized Stimuli

A.13 Human Preference Experiments

A.1 Author Contributions

PSS: project lead, drafted the initial manuscript and contributed to all parts of MindEye2 development. MT (core contributor): MindEye2 ablations, SDXL unCLIP vs. Versatile Diffusion comparisons, improved distributed training code, and experimented with approaches not used in the final model including training custom ControlNet and T2I adapters, using retrieval on COCO CLIP captions, and using diffusion priors to align fMRI to text embeddings. CKTV (core contributor): retrained and evaluated MindEye1 models, image captioning evaluations and writing, improved manuscript formatting, ROI-optimized stimuli experiments, and experimented with approaches not used in the final model including trying out different pretrained model embeddings, experimenting with T2I-Adapters and depth conditioning, experimenting with using past/future timepoints as additional conditioning, experimenting with blip2 for text prediction, and experimenting with behavioral embeddings. RK (core contributor): brain correlations, human preference experiments, recalculated metrics for 40- hour setting Ozcelik and VanRullen (2023) and Takagi and Nishimoto (2023) results, evaluations with varying amounts of training data across all models, assistance with data normalization, significant contributions to manuscript writing. TC: UMAP visualizations, improved the design for Figure 1, and experimented with approaches not used in the final model including using past/future timepoints as additional conditioning and using flattened voxels in MNI space instead of native space. AN: helped with ablations and experimented with replacing soft CLIP loss with soft SigLIP loss (Zhai et al., 2023) (not used in final model). CS: FAISS retrieval with MS-COCO (Appendix A.8) and experimented with approaches not used in the final model including experimenting with using past/future timepoints as additional conditioning, experimenting with blip2 for text prediction, and experimenting with behavioral embeddings. JX: helped with ablations, manuscript revisions and table formatting, experimented with approaches not used in the final model including experimenting with blip2 for text prediction, experimenting with behavioral embeddings, and improving model architecture. TN: assisted with human preference experiments. KN: oversaw the project, manuscript revisions and framing. TMA: oversaw the project, manuscript revisions and framing, helped keep project on-track thorugh MedARC and Stability AI communication.

This paper is available on arxiv under CC BY 4.0 DEED license.

Authors:

(1) Paul S. Scotti, Stability AI and Medical AI Research Center (MedARC);

(2) Mihir Tripathy, Medical AI Research Center (MedARC) and a Core contribution;

(3) Cesar Kadir Torrico Villanueva, Medical AI Research Center (MedARC) and a Core contribution;

(4) Reese Kneeland, University of Minnesota and a Core contribution;

(5) Tong Chen, The University of Sydney and Medical AI Research Center (MedARC);

(6) Ashutosh Narang, Medical AI Research Center (MedARC);

(7) Charan Santhirasegaran, Medical AI Research Center (MedARC);

(8) Jonathan Xu, University of Waterloo and Medical AI Research Center (MedARC);

(9) Thomas Naselaris, University of Minnesota;

(10) Kenneth A. Norman, Princeton Neuroscience Institute;

(11) Tanishq Mathew Abraham, Stability AI and Medical AI Research Center (MedARC).