Title: Improving the Accuracy and Efficiency of Injury Coding using Machine Learning, Expert Knowledge, and Linguistics
Speaker: Mr Gaurav Nanda, School of Industrial Engineering, Purdue University
Time: 930am, Friday, 25/11/16
Venue: IEOR Teaching Lab, Ground floor, IEOR Bldg
Abstract: Injury surveillance data, which includes external cause of injury codes (E-codes) that indicate the specific reason of an injury such as fall, cut, burn and electric shock), are valuable in facilitating analyses to understand the primary causes of injuries to direct prevention efforts. E-codes are typically assigned to injury records by trained human coders based on the injury narrative -- a process that is expensive in terms of time and resources and also requires expertise for accurate assignment of E-codes.
Machine Learning (ML) models such as Naïve Bayes, Support Vector Machine, and Logistic Regression, trained on coded injury data offer a promising alternative for quickly assigning E-codes to injury cases based on the narrative of injury but are not accurate enough to be used autonomously. This highlights the need for a semi-automated system that can assign E-codes to a large portion of data with high accuracy and can efficiently filter cases for human review.
In this seminar, the evaluation of different strategies for improving the prediction accuracy of ML models and efficiently filtering the cases for manual review will be presented. For improving the prediction performance of ML models, the approaches of increasing the size of training data, and applying unsupervised linguistic approaches help to some extent but are limited. The approach of applying linguistic rules based on the causal model of E-code was found to be effective.
For efficiently filtering cases for manual review, the approaches of a) agreement in prediction results of different ML models and models trained on balanced and unbalanced training sets, b) setting a threshold on the prediction probability of the model, and c) using stage-wise hierarchical classification were found to be effective.
Speaker bio: Gaurav Nanda is a Ph.D. candidate in the School of Industrial Engineering at Purdue University. His research interests include data-based injury surveillance, safety analytics, text mining, predictive modelling, and collaborative information systems. Before starting his Ph.D., Gaurav worked in the software industry for five years. He completed his B.Tech. in Agricultural and Food Engineering and M.Tech. (Dual Degree) in Water Resources Development and Management from IIT Kharagpur.