Call for Papers: Special Issue on the Promises and Dangers of Large Vision Models

Guest Editors

  • Kaiyang Zhou, Nanyang Technological University, Singapore
  • Ziwei Liu, Nanyang Technological University, Singapore
  • Xiaohua Zhai, Google Brain, Switzerland
  • Chunyuan Li, Microsoft Research, Redmond, US
  • Kate Saenko, Boston University, US

Computer vision, the science of teaching machines to understand the visual world, has witnessed in the past decade how the paradigm shift from hand-crafted methods to deep neural networks—known as deep learning—has revolutionized the field, leading to breakthroughs across a wide range of vision problems. Recently, we have observed a trend that has sparked new interests from the community and may greatly impact the field in the long run, i.e., the scaling of vision models.

Specifically, the size of vision models has grown exponentially from tens of millions of parameters to hundreds of millions, or even billions, particularly after the emergence of Vision Transformers. Moreover, the scale and diversity of training data also have been increased dramatically to match the growth in model capacity: not only in quantity (like billions of web examples) but also in modalities, such as combining image and language. Here we call them Large Vision Models (LVMs) for brevity, which include both unimodal and multimodal vision models (e.g., visual language models).

On one hand, LVMs learned from broad data at scale have demonstrated great power in terms of generalization capability: they can cope with a wide range of domains or scenarios, and can be adapted, with minimal twists, to handle multiple visual tasks, such as image classification/captioning/segmentation, object/keypoint detection, and depth/surface normal estimation. Furthermore, multimodal LVMs have also brought opportunities for numerous downstream zero-shot inference applications, such as open-vocabulary classification/detection/segmentation and image editing/generation.

On the other hand, LVMs come with challenges and risks that need to be addressed by the community: training is costly and has negative environmental impact; LVMs are too big to fine-tune on downstream datasets; uneven distribution of web data may cause social biases (w.r.t. gender and races) and inequalities; the commonsense reasoning ability of LVMs still lags behind; and so on.

This special issue seeks original contributions towards advancing LVMs—in terms of development, evaluation, adaptation, applications, understanding, and so on—and addressing the potential negative aspects brought by LVMs.

In comparison to recent special issues held in IJCV (e.g., Robust Vision, and Multimodal Learning) and TPAMI (e.g., Transformer Models in Vision), our proposal differs significantly in that we focus on recent advances in LVMs and their emergent capabilities and challenges, aiming to gather research to improve our understanding of LVMs, both empirically and theoretically, as well as how to better use them in practice to maximize the gains.

Aims and Scope

Topics of interest include (but are not limited to):

  • Training or adaptation methods for LVMs
  • LVM architecture designs (not limited to Transformer-based models)
  • Visualizing and interpreting LVMs
  • Emergent capabilities of LVMs
  • Applications and use cases of LVMs in computer vision
  • Theoretical insights into LVMs
  • Generalization and robustness of LVMs
  • Evaluation, biases, fairness, and safety of LVMs

Submission Timeline

  • Full paper submission deadline: extended to April 1st, 2023

  • Review deadline: May 30th, 2023

  • Author response deadline: June 26th, 2023

  • Final notification: July 26th, 2023

  • Final manuscript submission: August 26th, 2023


Author Resources
Authors are encouraged to submit high-quality, original work that has neither appeared in, nor is under consideration by other journals.  All papers will be reviewed following standard reviewing procedures for the Journal. Papers must be prepared in accordance with the Journal guidelines: www.springer.com/11263

Springer provides a host of information about publishing in a Springer Journal on our Journal Author Resources page, including FAQsTutorials along with Help and Support.
Other links include: