The amount of data medical databases doubles every 20 months, and physicians are at a loss to analyze them. Also, traditional data analysis has difficulty to identify outliers and patterns in big data and data with multiple exposure / outcome variables and analysis-rules for surveys and questionnaires, currently common methods of data collection, are, essentially, missing. Consequently, proper data-based health decisions will soon be impossible.
Obviously, it is time that medical and health professionals mastered their reluctance to use machine learning methods and this was the main incentive for the authors to complete a series of three textbooks entitled “Machine Learning in Medicine Part One, Two and Three, Springer Heidelberg Germany, 2012-2013", describing in a nonmathematical way over sixty machine learning methodologies, as available in SPSS statistical software and other major software programs. Although well received, it came to our attention that physicians and students often lacked time to read the entire books, and requested a small book, without background information and theoretical discussions and highlighting technical details.
For this reason we produced a 100 page cookbook, entitled "Machine Learning in Medicine - Cookbook One", with data examples available at extras.springer.com for self-assessment and with reference to the above textbooks for background information. Already at the completion of this cookbook we came to realize, that many essential methods were not covered. The current volume, entitled "Machine Learning in Medicine - Cookbook Two" is complementary to the first and also intended for providing a more balanced view of the field and thus, as a must-read not only for physicians and students, but also for any one involved in the process and progress of health and health care.
Similarly to Machine Learning in Medicine - Cookbook One, the current work will describe stepwise analyses of over twenty machine learning methods, that are, likewise, based on the three major machine learning methodologies:
- Cluster methodologies (Chaps. 1-3)
- Linear methodologies (Chaps. 4-11)
- Rules methodologies (Chaps. 12-20)
In extras.springer.com the data files of the examples are given, as well as XML (Extended Mark up Language), SPS (Syntax) and ZIP (compressed) files for outcome predictions in future patients. In addition to condensed versions of the methods, fully described in the above three textbooks, an introduction is given to SPSS Modeler (SPSS' data mining workbench) in the Chaps. 15, 18, 19, while improved statistical methods like various automated analyses and Monte Carlo simulation models are in the Chaps. 1, 5, 7 and 8.
We should emphasize that all of the methods described have been successfully applied in practice by the authors, both of them professors in applied statistics and machine learning at the European Community College of Pharmaceutical Medicine in Lyon France. We recommend the current work not only as a training companion to investigators and students, because of plenty of step by step analyses given, but also as a brief introductory text to jaded clinicians new to the methods. For the latter purpose, background and theoretical information have been replaced with the appropriate references to the above textbooks, while single sections addressing "general purposes", "main scientific questions" and "conclusions" are given in place.
Finally, we will demonstrate that modern machine learning performs sometimes better than traditional statistics does. Machine learning may have little options for adjusting confounding and interaction, but you can add propensity scores and interaction variables to almost any machine learning method.