Call for Papers: Special Issue on Automating Data Science

Call for Papers: Special Issue on Automating Data Science


Machine learning applications involve phases of data exploration, data engineering, model building and deployment (possibly in an iterative fashion). A large fraction of the time on such projects is usually devoted to phases other than model building.  Thus it is clear that

data scientists are in urgent need for more powerful tools to help them in the entire data science process. While there has been significant progress in some areas, exemplified by the success of automated machine learning (AutoML), other areas, such as data understanding, data preparation, and deployment still need fundamental research breakthroughs.  There are significant gains to be made, with considerable potential impact in science, industry and society, should these aspects be successfully automated or semi-automated.


This special issue will focus on the key open research questions in the automation of data science, such as how to automate data collection/generation, data labeling, data wrangling, data

preprocessing/augmentation, data quality evaluation, data debt, and data governance. Issues relating to deployment include understanding the patterns and models obtained, publishing them as building blocks for new discoveries (e.g., in scientific papers or reports), and validating and monitoring their operation. Given the importance of domain knowledge, integrating such knowledge or effectively involving humans in the loop are also topics very much in

scope, as are ways to quantify data quality (or messiness) and tracking progress. Many of these areas are in a nascent stage, and we aim to further their development by knitting them together into a coherent whole and showcasing the best innovations and most original ideas.


The special issue will cover all areas of data science automation, but we especially welcome research that focuses on steps before and after modeling, dealing with "messy data", or extending the AutoML paradigm beyond supervised tasks.


Topics of interest include:

- Automating data wrangling

- Data integration via AI techniques (e.g., NLP)

- Merging the preparation of data into the statistical learning

- Handling missing and anomalous values semi-automatically

- Using NLP for generating explanations and reports.

- Incorporating domain knowledge into the automation of data science.

- Semi-automating visualization

- Semi-automated machine learning

- Learning with non-normalized data

- Impact of data science automation on the work of data scientists


Contributions must contain new, unpublished, original and fundamental work related to the Machine Learning Journal's mission. All submissions will be reviewed using rigorous scientific criteria whereby the novelty of the contribution will be crucial.


Submission Instructions

Submit manuscripts to: http://mach.edmgr.com/. Select "SI: Automating

Data Science" as the article type. Papers must be prepared in

accordance with the Journal guidelines:

https://www.springer.com/journal/10994/submission-guidelines?IFA


Key Dates:

All dates indicative, except for the Round 1 submission deadline which is firm.


Round 1 submission deadline: January 15, 2022 

Round 1 author notification: March 28, 2022 

Round 2 submission deadline: June 15, 2022 

Round 2 author notification: September 30, 2022 


Guest Editors:

Tijl De Bie (Ghent University, Belgium)

Jose Hernandez-Orallo (Universitat Politecnica de Valencia, Spain)

Joaquin Vanschoren (Eindhoven University of Technology)

Gaƫl Varoquaux (INRIA)

Chris Williams (University of Edinburgh)