335 The Benefits of Metadata Analysis and Form Question Harmonization

Track: Contributed Abstracts
Wednesday, February 13, 2013, 6:45 PM-7:45 PM
Hall 1 (Salt Palace Convention Center)
Sandra Sorensen , CIBMTR - IT, National Marrow Donor Program, Minneapolis, MN
Robinette Renner , CIBMTR - IT, National Marrow Donor Program, Minneapolis, MN
BACKGROUND:  In order to collect high-quality data from multiple organizations, there must be a clear understanding as to what data is needed and how it is to be reported.  Metadata, often described as “data about data,” describes the content, quality, and other attributes of the data being collected.  These attributes include maximum length, number of decimal places, data type (character, number, or date), multiple choice answer vs. free text, etc.  Historically, FormsNet (FN), the Center for International Blood and Transplant Research’s (CIBMTR) electronic data capture system, stored the metadata at the question level.  Since data collection forms were created independently, differences in question design and metadata became problematic.  For example, primary disease is a list of valid values on one form and free text on another.  In an effort to improve data quality and facilitate analysis, the use of well-defined metadata and data standards, took on a more prominent role. 

METHODS:  Common data elements (CDE) were created using the National Cancer Institute’s Cancer Data Standards Registry and Repository.  CDEs consist of two parts:  the Data Element Concept describes the form question and the Value Domain describes how the answer should be reported.  Each CDE contains the metadata required for a form question.  Questions requiring the same metadata are represented by a common CDE.  In addition, the metadata in FN was moved from the question level to a cross-form data dictionary and linked to CDEs to further standardize the required data and its format. 

RESULTS:  A review of the current baseline and follow-up forms indicates that harmonizing the data dictionary entries (DDC) with the CDEs has led to an approximately 25% reduction (583 DDCs and 437 CDEs) in the number of data points being defined multiple times.  This has led to more consistent forms and data collection. 

CONCLUSION:  The CIBMTR's data collection forms include questions that are asked multiple times within and across forms.  To facilitate data entry and analysis, form inconsistencies needed to be addressed.  To help alleviate these issues, the data dictionary entry and metadata are tied to a CDE.  In addition, a metadata review is now undertaken at each step in the form revision/development process to ensure questions are harmonized, terminology is used consistently, question formats are standardized, and the option values are semantically similar.  Exceptions are only allowed when clinical differences and regulatory compliance dictate.  The benefits of using well-defined metadata and data standards include unambiguous interpretation of data points, improved data exchange, facilitated data analysis, improved cross-form consistency, and the creation of a pool of data elements to be used for new form development.