Assessing Health Related Quality of Life in Persons with Diabetes : A Comparison of Generic Measures

Methods: Subjects were identified from the National Health Measurement Study (NHMS). Diabetes severity was defined as no diabetes, diabetes without insulin, and diabetes with use of insulin. Unadjusted and adjusted mean differences between the diabetes severity groups were estimated for 11 generic HRQoL measures. Unadjusted and adjusted mean differences between diabetes severity groups were estimated. Effect sizes were calculated to estimate standardized group differences.


INTRODUCTION
In 2014, 12.3% of the U. S. adult population (about 28.9 million individuals) had diabetes mellitus [1].Diabetes is known to be associated with significant morbidity, mortality, and resource utilization.It is a leading cause of kidney failure, blindness, heart disease, stroke, nontraumatic lower limb amputation, and the seventh leading cause of death in the United States [1].
Diabetes has a significant impact not only on the physical health of patients, but their social and emotional health as well; as such, Health Related Quality of Life (HRQoL) is a crucial health outcome in diabetes.Generic health measures are commonly used to assess HRQoL in population studies of persons with diabetes and other chronic illnesses.One of the main benefits of generic measures over diabetes-specific measures is the ability to compare HRQoL across disease states.Given the prevalence of diabetes in the US population as well as its influence on the increasing cost of healthcare, comparisons of HRQoL outcomes across disease states are important for making policy decisions.Currently, it is unclear which generic instrument is best suited for assessing HRQoL in a population of patients with diabetes.The EuroQoL 5-Dimension (EQ-5D) and the Medical Outcomes Study Short-Form 36 (SF-36v2) are commonly used as measures of HRQoL in patients with diabetes and have been found to be valid in these populations [2 -12].The Health Utilities Index Mark 2 and 3 (HUI2 and HUI3) have also demonstrated validity in patients with type 2 diabetes [13].
Numerous comparisons of validity and discriminative ability have been made between the most widely used generic measures-the EQ-5D, SF-36, and, to some extent the HUI2 and HUI3.It is well documented that most generic measures demonstrate lower HRQoL in patients with diabetes compared to those who do not have diabetes across various measures of HRQoL [3 -11, 13 -15]; however, there is mixed evidence about the ability of these generic measures to discriminate among levels of severity.In general, the SF-36 and its variants (the SF-12 and SF-6D) do not differentiate well among levels of severity [16 -18]; though Jacobson and colleagues did find that patients on diet treatment alone had higher general health perception scores than those on insulin [16].There is some evidence to indicate that the SF-12, as traditionally scored, does not discriminate between level of severity of diabetes [17] and Redekop and colleagues found patients taking insulin had lower HRQoL scores on the EQ-5D compared to patients with diabetes but not on insulin [19].Kontodimopoulos and colleagues examined patients with diabetes and the presence of various comorbid conditions or diabetic complications as an indicator of severity and found lower EQ-5D and SF-6D scores for patients with CHD, arthropathy, or diabetic foot compared to those without each of those conditions [20].
Little is known about whether other commonly used existing generic measures-the Quality of Well-Being Scale (QWB-SA) or the Health and Activities Limitation Index (HALex)-provide consistent results in their assessment of the HRQoL in the diabetic population or if these measures are sensitive enough to differentiate between those with less and more severe diabetes.Though the prevailing wisdom is that diabetes-specific measures are more appropriate for assessing HRQoL in persons with diabetes, there are times when these measures are not available to researchers due to monetary restrictions, the use of secondary data sources, or because of the desire to compare HRQoL across multiple disease states.Because generic measures continue to be utilized in diabetes-relevant research, it is important to understand how these measures perform, both comparatively and individually, within a diabetes-specific sample.The objective of this study was to assess differences in estimated HRQoL/health status in a population-based sample of individuals with diabetes and within levels of diabetes disease severity across various generic measures of HRQoL/health status.This study was deemed exempt by the Saint Louis University Institutional Review Board.

MATERIALS AND METHODS
Data came from the National Health Measurement Study (NHMS).Briefly, the NHMS was a random-digit-dial telephone interview of a sample of non-institutionalized adults living in the contiguous United States in 2005-2006.The sampling and weighting scheme has been described previously [21].Briefly, 29,844 households were deemed potential contacts, 15,450 (54%) of these were unable to be contacted, 14,394 households were able to be reached.Of those reached, 11,656 completed the screening process and 6822 of those had at least one eligible household member.Of those with an eligible household member, 4334 agreed to begin the interview and 3,844 eligible participants completed the interview.Two participants had missing data on the diabetes variable, leaving the final analytic sample as 3,842.

Generic Measures of HRQoL
Participants in the NHMS completed the SF-36v2TM [22], the Health Utilities Index version 2 [23] and version 3 [24] (HUI2 and HUI3), the EQ-5D [25], the Quality of Well-Being Scale (QWB-SA) [26], and the Health and Activities Limitations Index (HALex) [27].Scoring algorithms for each of the preference-based utility measures can be found in the NHMS study description [21].

SF-36v2, SF-12, and SF-6D
The SF-36v2 is a 36 item generic HRQoL instrument that covers eight domains and produces a physical component score (PCS) and an mental component score (MCS).The SF-12v2 is a shortened version of the SF-36v2 and also produces a physical component score and a mental component score.Both the SF-36v2 and the SF-12v2 have values ranging from 0 to 100.The SF-36v2 is a commonly utilized general health measure that measures eight domains of health related quality of life: Physical Functioning, Role-Physical, Bodily Pain, General Health, Vitality, Social Functioning, Role-Emotional, and Mental Health.The eight domains are combined to produce two summary health measures: Physical Component Score (Physical Functioning, Role-Physical, Bodily Pain, and General Health domains) and Mental Component Score (Vitality, Social functioning, Role-Emotional, and Mental Health domains).The SF-12 is computed from a subset of the 36 items that make up the SF-36v2 and also provides a summary mental component score and a physical component score.For both the SF-36v2 and the SF-12, the Mental Component Score and The Physical Component Score range from 0 to 100.The time period assessed by the SF-36v2, the SF-12, and the SF-6D includes rating health "in general," how limited by their health they are now, and how much it has limited them in the past 4 weeks.
The SF-6D utility index [28] is a preference-based utility measure that can be calculated from both the SF-36v2 and the SF-12.The SF-6D is computed from a subset of questions from the SF-36v2 or the SF-12 and reduces the domains from 8 to 6. Unlike the SF-36v2 and the SF-12, the scoring algorithm for the SF-6D produces utility scores ranging from .30-1.00.The scoring algorithm for utility scores was derived from standard gamble assessments.

HUI2 and HUI3
The HUI2 and HUI3 are preference-based utility measures that (theoretically) are valued between 0 (dead) and 1 (perfect health).Due to scoring, the HUI2 and HUI3 actually allow scores below 0, worse than death, with possible scores ranging from -0.03 to 1.0 and -0.36 to 1.0, respectively [23,24].Utility score algorithms were derived from standard gamble assessments for both measures.The HUI2 assesses health status on 6 domains (sensation, mobility, emotion, cognition, self-care, and pain), whereas the HUI3 defines health on 8 domains (vision, hearing, speech, ambulation, dexterity, emotion, cognition, and pain).Both the HUI2 and the HUI3 ask responders to use the past week as the time point of reference.

EQ-5D
The EQ-5D is a preference-based utility measure examining 5 domains (mobility, self-care, usual activities, pain/discomfort, and anxiety/depression) and is scored utilizing an algorithm derived from time tradeoff assessments [25].Scores from the EQ-5D range from 0 to 1.0.The time point of reference for the EQ-5D is the participant's health on that day.The EQ-5D measure used for this study provided 3-level response choices for each item (no problems, moderate problems, severe problems).A more recent 5-level version of the EQ-5D has been subsequently released, but was not available at the time of data collection of the NHMS [29].

QWB-SA
The QWB-SA is a preference-based utility measure that assesses three domains of functioning-mobility, physical activity, and social activity-and combines those three domains with a symptoms and health problems checklist.The final scoring algorithm provides weights for the domains as well as each symptom or health problem and produces a final summary score ranging between 0.09 and 1.0 [26].The QWB-SA asks responders to reference their health over the past three days.

HALex
The HALex is the summary index used for the National Health Interview Survey [27].The time period of reference is "your health in general".The measure assesses 2 domains: activity limitations and self-reported health.The scoring algorithm for the HALex was developed ad hoc by the NHMS authors [19] using preference data from the HUI2.The final summary score ranged from 0.10 to 1.0.
In addition to the HRQoL measures, the NHMS collected data on eleven common health conditions, including diabetes.Persons were classified as having diabetes if they answered "yes" to the question "Have you ever been told by a doctor or other health professional that you have diabetes?"Diabetes severity was grouped as follows: no selfreported diabetes (n = 3116), self-reported diabetes without insulin (n =529), and self-reported diabetes with use of insulin (n = 197).

Statistical Analysis
All analyses were performed using SAS version 9.3 (SAS Institute, Cary, NC) Survey Procedures and utilized the trimmed, post-stratification sampling weights to produce nationally representative estimates.Weights were provided in the NHMS database to account for the stratified sampling scheme and are representative of the US population for the year 2000.There was no missing data for diabetes, age, sex, or race variables in the study.There were limited "don't know" and "refused" responses for the comorbid conditions.For the purposes of calculating the final summative comorbid condition variable, the missing responses were taken as absence of the condition.No comorbid condition variable had more than 17 missing responses and there was no statistically significant difference in sex, race, or diabetes status between those with and without missing data on chronic conditions (data not shown).There was a statistically significant difference between those with and without missing data for age with older ages more likely to have missing data (χ 2 = 35.0,p < .0001).Weighted means and standard deviations of the score for each measure stratified by diabetes severity group were calculated.Unadjusted and adjusted differences in least squares means between each of the diabetes severity groups for the scores for each measure were estimated.Three linear regression models were estimated for each of the health measures: (1) an unadjusted model, (2) a model adjusted for age, sex, and race, and (3) model 2 plus the presence of comorbid conditions classified as none, 1-2, or 3 or more from the available conditions in the NHMS data set (arthritis, coronary heart disease, depression, stroke, eye disease, sleep disorders, thyroid disorder, respiratory disease, ulcer, and back pain).Finally, effect sizes were calculated to estimate standardized group differences from model 3 by dividing the difference of least squares means by the residual standard deviation of the model.In this context, the effect size is an indicator of the measure's ability to discriminate between known groups.Using Cohen's guidelines, an effect size of 0.2-0.5 is considered small, 0.5-0.8medium, and > 0.8 large [30].An alpha of 0.01 was used to assess statistical significance for all comparisons.

RESULTS
Demographics for the sample stratified by diabetes severity can be found in Table 1.Persons on insulin were proportionally more female, older, and had a higher proportion of other chronic conditions including back pain, respiratory disease, eye disease, and coronary heart disease (unweighted).Forty-five percent of patients on insulin were White compared to 57% of those not on insulin, and 70% of those without diabetes (unweighted).Little difference was observed in insurance status by diabetes severity group.For the preference-based utility measures, the EQ-5D produced higher estimates of HRQoL than the other measures.Mean scores for each generic measure weighted to the US population and stratified by diabetes severity are reported (Table 2).Across all measures, persons on insulin demonstrated lower HRQoL scores compared to those not taking insulin.Unadjusted and adjusted mean score differences between the diabetes severity groups weighted to the US population were calculated (Table 3).In the unadjusted model, persons with diabetes taking insulin had statistically significantly lower mean HRQoL scores than persons without diabetes across all measures.Persons with diabetes but not taking insulin demonstrated statistically significantly lower HRQoL scores for all measures with the exception of the SF-12 MCS and SF-36v2 MSC.Only the HALex demonstrated a statistically significant difference between persons with diabetes taking insulin and those with diabetes without insulin in the unadjusted analysis (mean difference 0.14, p = .002).
After adjusting for age, sex, race (model 2), results were similar to the unadjusted analysis.All measures indicated persons diabetes and taking insulin had statistically significantly lower HRQoL scores than persons without diabetes, persons with diabetes but not taking insulin had lower HRQoL scores than persons without diabetes for all measures except the SF-12 MCS and SF-36v2 MCS, and the HALex was the only index that still maintained a statistically significant mean difference between the diabetes severity groups (p = 0.002).
After adjustment for additional comorbid conditions (Model 3), the EQ-5D no longer indicated differences in the HRQoL between any of the diabetes severity groups.Persons with diabetes had statistically significantly lower HRQoL scores than those without diabetes for all other measures with the exception of the SF-6D12, the SF-12 MCS and the SF-36v2 MCS.Persons with diabetes taking insulin had statistically significantly lower HRQoL scores than persons without diabetes for all other measures with the exception of the HUI2, HUI3, and the SF-36v2 MCS.Once more, the HALex was the only index that demonstrated a statistically significant difference in HRQoL scores between persons with diabetes taking insulin and those with diabetes but not on insulin.
Table 4 depicts effect sizes between diabetes severity groups for each HRQoL measure.Effect sizes ranged from 0.03 to 1.05 across all measures and were small to moderate for the majority of comparisons.Across all measures, effect sizes were generally larger for the comparison of the diabetes and insulin group to the no diabetes group than for the other comparisons.The HALex had the largest effect size across all comparisons followed by the SF-12 PCS and the SF-36v2 PCS.

DISCUSSION
Across all HRQoL measures, persons with diabetes (regardless of severity) demonstrated lower HRQoL scores than those without diabetes and the largest differences were seen between persons with diabetes taking insulin and persons without diabetes.This is not surprising given that the addition of insulin to the diabetes treatment regimen is indicative of more severe disease and, possibly, a larger treatment burden.The overall means for persons with diabetes were similar to estimates from previous studies of generic measures in persons with diabetes [7,13,31].In both the unadjusted and adjusted analyses, The EQ-5D produced higher estimates of HRQoL than other preference-based utility measures.This is most likely due to the fact that the EQ-5D has fewer items/domains than other measure allowing for less variability in individual scores.All measures seemed to be able to discriminate between persons with and without diabetes regardless of insulin status as indicated by their small to moderate effect sizes; a finding consistent with prior literature [2 -13, 15].Only the HALex, however, was able to demonstrate a statistically significant difference between the diabetes with insulin and the diabetes without insulin once we had controlled for age, sex, race, and other comorbid conditions.This is in contrast to the findings by Redekop and colleagues who found that EQ-5D scores were lower for patients taking insulin compared to those on oral medications or diet alone [19] and Kontodimopoulos and colleagues who found the lower scores on the EQ-5D and SF-6D for individuals with diabetes and additional comorbid conditions [20].The HALex has demonstrated more sensitivity to differences in HRQoL for individuals with multiple, self-reported health conditions (a rough proxy for illness severity) compared to the EQ-5D and the SF-6D [32], which is consistent with our findings that the HALex was able to discriminate between diabetes severity groups.This is the first study to concurrently examine the performance of 11 generic health measures in the same sample of patients with diabetes.Similar attempts to examine the performance of generic health measures in coronary heart disease have been published [33].All measures provide roughly similar estimates of HRQoL from preference-based measures, though the EQ-5D does appear to indicate higher HRQoL compared to other measures.Mean scores for the SF-12 and SF-36v2 cannot be directly compared to the scores on other measures as the SF-12 and SF-36v2 are scored on a scale of 0 to 100 rather than 0 to 1.By calculating effect sizes, we were able to use a standardized format to draw general conclusions about assessing HRQoL across all measures including the SF-12 and SF-36v2.The physical health components of the SF-12 and SF-36 were lower than the mental health components for all severity groups; indicating the importance of physical health in determining QOL for persons with diabetes.
Only the HALex demonstrated statistically significant differences between severity levels of diabetes.The HALex is the only measure that uses self-rated health as a mechanism for describing health states [21].It is possible that selfrated perceptions of health are more sensitive to differences in diabetes severity than other rating systems.There were small, but noticeable, effect sizes for the SF-6D12, SF-12 MCS, SF-12 PCS, and the SF-36v2 MCS.Prior research has indicated that there is a difference in HRQoL depending on treatment-assessed severity [16,19].Also of interest, the effect sizes for the mental component scores for the SF-12 and SF-36v2 were higher than the effect size for the PCS than the MCS for comparisons between persons with and without diabetes and between persons with diabetes taking insulin and those without diabetes.However, in the comparison between persons with diabetes taking insulin and those with diabetes but not taking insulin, effect sizes were larger for the MCS than the PCS.It is possible that physical health is better at discriminating between those who are ill and those who are well; however, mental health may be better at distinguishing between the severity of those who are already ill.Further research is needed to examine this possibility.
This study has severable notable limitations.First, diabetes is self-reported and medication regimen was used as a proxy for disease severity.Self-report diabetes status has demonstrated acceptable reliability in prior studies [34 -37].Acceleration of diabetes treatment from diet alone to oral medication to the use of insulin is generally representative of the severity of diabetes.Another limitation is that the NHMS data was generated from a random digit dial survey; therefore, the respondents are potentially more likely to have higher incomes or be more educated than the general population.
Despite these limitations, this study provides evidence that there are somewhat systematic differences in HRQoL estimates from various generic instruments.The EQ-5D appears to produce the highest estimates, while QWB-SA produces the lowest.This reflects the possibility that each instrument is measuring a different aspect of HRQoL and is consistent with prior research in non-disease-specific samples [21,38].

CONCLUSION
The evidence demonstrates that the HALex might be the most useful generic measure of HRQoL for researchers interested in differences in severity in cross-sectional samples of patients with diabetes.Further research is necessary to assess the sensitivity of the HALex to changes over time.Though diabetes-specific measures might be preferred for establishing group differences, the ubiquity of generic measures in diabetes-related research indicates a need for further assessment of the utility of these measures in a disease-specific population.