The DFCI/BWCC HSCT program is estimated to perform 525 transplants in 2012, and has performed 6000 transplants since inception in 1972. The quality of this data had previously not been reviewed on a large scale, only by smaller projects examining selected data fields for limited patient sets. The accuracy of this data is paramount since it is used for analysis of patient outcomes, policy compliance and operational considerations.
The goal of this project was to develop a comprehensive and efficient method of data validation for DFCI’s internal HSCT repository and DFCI’s SCTOD data.
Methods:
Fifty-nine transplant essential data fields were selected for analysis including Day 0, Disease Status at Transplant, Best Response, aGVHD, and cGVHD. A program for comparing DFCI’s internal repository data and DFCI’s CIBMTR data (retrieved with the Data Back to Center tool) was designed in Microsoft Access, accounting for slight differences in coding rules and logic. In 2011 over 200,000 individual data points were compared. The analysis was performed in 2012 with more recent data.
Results:
In 2011 the pre-HSCT and post-HSCT data sets had overall error rates of 0.51% and 0.77%, respectively. The pre-HSCT fields with error rates above 2% were Diagnosis Date (2.16%), KPS (2.23%), and Reason RIC (2.22%). The post-HSCT fields with high error rates above 2% were Cause of Death (3.27%) and Date of Death (3.94%). All errors were corrected and areas for staff education and codebook improvements were determined and implemented.
In 2012 the error rates for the previous year’s fields with high error rates were Diagnosis Date (3.71%), KPS (0.80%), Reason RIC (2.14%), Cause of Death (2.34%), and Date of Death (1.37%) for data reported before the educational updates. The coding accuracy improved for data reported after the educational updates. For example, the error rates for the data that was reported after the educational updates for the previous year’s fields with high error rates were Diagnosis Date (0.70%), KPS (1.69%), and Reason RIC (1.67%). Very limited post-HSCT data was available for data reported after the educational updates.
Conclusion:
The pre-HSCT and post-HSCT data sets for DFCI’s internally and externally reported data had overall percent error rates well below the HSCT Program’s target error rate of 2% or lower. When the analysis was performed after staff education and codebook revisions, data accuracy improved. Comparing similar data entered into different databases is a valuable tool to correct data errors, as well as to improve data accuracy in the future.