August 2016, VOLUME6 /ISSUE 8

Identifying Communication-Impaired Pediatric Patients Using Detailed Hospital Administrative Data

  1. Douglas L. Hill, PhDa,
  2. Karen W. Carroll, BSa,
  3. Dingwei Dai, PhDb,
  4. Jennifer A. Faerber, PhDa,
  5. Susan L. Dougherty, PhDa and
  6. Chris Feudtner, MD, PhD, MPHa
  1. aDepartment of Pediatrics, The Children’s Hospital of Philadelphia, Philadelphia, Pennsylvania; and
  2. bInfomatics, Independence Blue Cross, Philadelphia, Pennsylvania
  1. Address correspondence to Chris Feudtner, MD, PhD, MPH, General Pediatrics, 3535 Market St, Room 1523, The Children’s Hospital of Philadelphia, 34th and Civic Center Blvd, Philadelphia, PA 19104. E-mail: feudtner{at}email.chop.edu
  1. Dr Hill developed the study concept and design, drafted the study instruments, supervised the study, acquired data, analyzed and interpreted the data, and drafted the manuscript; Ms Carroll acquired data and provided administrative, technical, and material support; Dr Dai acquired data from PHIS and created indicator variables based on ICD-9-CM codes; Dr Faerber analyzed and interpreted the data; Dr Dougherty developed the study concept and design; Dr Feudtner developed the study concept and design, obtained funding, drafted the study instruments, and analyzed and interpreted the data; and all authors critically revised the manuscript for important intellectual content and approved the final manuscript as submitted.


BACKGROUND AND OBJECTIVES: Pediatric inpatients with communication impairment may experience inadequate pain and symptom management. Research regarding potential variation in care among patients with and without communication impairment is hampered because existing pediatric databases do not include information about patient communication ability per se, even though these data sets do contain information about diagnoses and medical interventions that are probably correlated with the probability of communication impairment. Our objective was to develop and evaluate a classification model to identify patients in a large administrative database likely to be communication impaired.

METHODS: Our sample included 236 hospitalized patients aged ≥12 months whose ability to communicate about pain had been assessed. We randomly split this sample into development (n = 118) and validation (n = 118) sets. A priori, we developed a set of specific diagnoses, technology dependencies, procedures, and medications recorded in the Pediatric Health Information System likely to be strongly associated with communication impairment. We used logistic regression modeling to calculate the probability of communication impairment for each patient in the development set, assessed the model performance, and evaluated the performance of the 11-variable model in the validation set.

RESULTS: In the validation sample, the classification model showed excellent classification accuracy (area under the receiver operating characteristic curve 0.92; sensitivity 82.6%; 95% confidence interval, 74%–100%; specificity 86.3%; 95% confidence interval, 80%–97%). For the complete sample, the predicted probability of communication impairment demonstrated excellent calibration with the observed communication impairment status.

CONCLUSIONS: Hospitalized pediatric patients with communication impairment can be accurately identified in a large hospital administrative database.

Hospitalized children experience many unpleasant symptoms including nausea, discomfort, fatigue, decreased appetite, drowsiness, and pain that often are not adequately assessed or treated.15 Pediatric patients who are communication impaired, especially those whose impairments are due not to young age or language barriers but to cognitive disability or the effects of medications, pose particular challenges for effective pain and symptom management.6,7 Physicians may have greater difficulty diagnosing the cause of pain or other symptoms and effectively providing symptom relief for these children and adolescents.810 We previously conducted a point prevalence study at a large pediatric hospital to identify which inpatients had difficulty communicating based on bedside nurse reports.11 We found that 38% of inpatients had some difficulty communicating and that 61% of these patients with communication impairment had experienced pain during the hospitalization.

Studies at individual institutions have shown that children who are cognitively impaired or who cannot speak English receive less pain medication than similar patients with the same conditions.1215 Whether institutions vary in how pediatric patients who are communication impaired are treated for pain has yet to be explored. Large, clinically detailed patient data sets can be used to examine variations in practice across hospitals, identifying potential areas for improvement. For example, 1 study found that opioid use varied substantially across hospitals even after patient demographic and clinical characteristics, hospital type, and hospital patient volume were accounted for.16 Currently researchers cannot directly examine treatment differences for communication-impaired patients across institutions because existing pediatric health data sets do not include information about patient communication ability. We therefore specifically sought to develop a classification model that, using information typically contained in hospital administrative databases, can accurately identify patients who are likely to be communication impaired. An accurate classification model would then enable comparison of pain and symptom management between patients with high or low likelihood of having communication impairment within and between institutions. Conceptually, this approach is analogous to previous studies that have used various classification methods to identify pediatric patients in large administrative data sets with specific conditions such as autism spectrum disorder, sickle cell disease, urinary tract infections, and pneumonia.1720


Human Subjects Protection

The Children’s Hospital of Philadelphia Committee for the Protection of Human Research Subjects approved the protocol for this study.

Study Sample

Our study sample was based on the previously mentioned point prevalence study of communication impairment among all hospitalized pediatric patients aged ≥12 months that we conducted in our children’s hospital.11 Patient medical record numbers were obtained from nurse reports. Patient age, sex, ethnicity, and spoken language were obtained from the medical record.

Clinically Detailed Administrative Data Source and Merged Data Set

Several months after these patients were discharged from the hospital, their Pediatric Health Information System (PHIS) data became available, and by using each patient’s medical record number we merged the data gathered from the point prevalence study with the clinically detailed PHIS data. The PHIS database is maintained by the Children’s Hospital Association (Overland Park, KS) and includes resource utilization data from 43 tertiary children’s hospitals representing most major US metropolitan areas and ∼70% of tertiary pediatric acute care hospital admissions in the United States.21 The PHIS database includes patient demographics, diagnoses, and procedures, as well as detailed pharmacy information. Data elements include International Classification of Diseases, Ninth Revision, Clinical Modification (ICD-9-CM) codes and Clinical Transaction Classification codes for each procedure. The PHIS database also includes generic drug entity dispensed (including medications used to alleviate pain and symptoms) and clinical services rendered for each day of hospital stay for each patient. PHIS data quality and reliability are ensured through a joint effort between the Children’s Hospital Association and participating hospitals, and data are included only if classified errors occur in <2% of a hospital’s quarterly data, which are deidentified before extraction and analysis. Major Diagnostic Categories were based on the patient’s assigned All Patient Refined Diagnosis Related Groups classification in the PHIS database.

Development of the Communication Impairment Classification Model

Our research team included a pediatrician with extensive clinical experience caring for hospitalized children with serious illness and a psychologist with experience caring for children with communication disorders. Based on our clinical experience and consultations with other pediatricians, psychologists, and pain and symptom experts, we developed a priori the set of codes for conditions, medications, technology dependencies, and procedures or tests that were likely to be associated with communication impairment. We then subdivided conditions and medications into those with a high or moderate probability of being associated with communication impairment for a patient (see Appendix 1). Some conditions and procedures on the list are rare, and we did not expect cases of every single code to occur in a data set of this size. We therefore created 6 dichotomous indicator variables (eg, condition–high, condition–moderate, medication–high, medication–moderate, technology dependencies, and procedures or tests). Each indicator variable was equal to 0 if the patient had no codes from the list and equal to 1 if a patient had ≥1 code from the list. The classification model consisted of these 6 variables and indicator variables for 6 age categories (1 year, 2 years, 3–4 years, 5–9 years, 10–17 years, ≥18 years). These 11 variables were used to calculate the probability that a given patient was communication impaired, as reported below.

Statistical Analysis

The data set was randomly split into development and validation samples in a 1:1 ratio. In the development sample, we used logistic regression modeling to derive the probability of communication impairment for each patient. The gold standard of communication ability was the bedside nurse report. Patient communication impairment (defined as inability to communicate clearly, using words in full sentences for patients aged ≥5 years, inability to communicate in simple sentences for patients aged 2–4 years, and inability to communicate at all for patients aged 1 year) was the outcome of the model. The predictors were conditions (high and moderate probability), medications (high and moderate probability), technology dependencies, medical procedures or tests, and patient age (see Appendix 1 for a complete list of all variables and values used in the classification model). We assessed the model’s performance by examining the area under the receiver operating characteristic curve (AUC).22,23 The AUC may be interpreted as an estimate of the probability that a randomly chosen person with a specific condition, at each point along the curve, has a higher score than a randomly chosen person without the condition. Unlike sensitivity and specificity, the AUC is not affected by what cutoff value is chosen because the AUC evaluates the model at all cutoff values.24,25 An AUC score of 0.90 or more is considered excellent discrimination of cases, 0.80 to 0.89 as good, and 0.70 to 0.79 as fair.26 We also calculated the sensitivity, specificity, positive predictive value, negative predictive value, positive likelihood ratio, and negative likelihood ratio by using a cutoff point of 0.25 predicted probability of communication impairment.27

Satisfied with the model’s performance in the development sample, we then applied the unaltered model to the validation sample and used the same classification performance measures to evaluate the model, calculating the AUC and constructing 1000 bootstrap samples to calculate the 95% confidence intervals (CIs) for measures of sensitivity, specificity, positive predictive value, negative predictive value, positive likelihood ratio, and negative likelihood ratio.

To assess the calibration of the model, we pooled the 2 samples (because the performance of the model was slightly better in the validation than in the development sample) and graphed the nurses’ reports for communication impairment (the gold standard, a dichotomous variable) against the predicted probability of communication impairment (a continuous variable), plotting both the individual data points for each dichotomous report of communication impairment and each predicted probability for that report based on the classification model and the moving average of these reports across the range of predicted probabilities by using a local polynomial smooth function with a 95% CI.

Analyses were conducted with Stata 13.1 (Stata Corp, College Station, TX) and SAS version 9.3 (SAS Institute, Inc, Cary, NC).


Nurses completed questionnaires for 259 inpatients aged ≥12 months. Nurse reports and PHIS data were matched for 236 inpatients aged ≥12 months. Four patients were excluded because the medical record number was recorded incorrectly, 1 was excluded because the patient was <12 months old, 11 were excluded because they had not been discharged in 2013, and 7 were excluded because they could not be located in the PHIS database. Patient demographic characteristics, and whether patients were communication impaired, are reported in Table 1. The ages of the patients ranged from 1 to 34 years (mean 10.2, SD 6.5), with the majority of patients (89%) being ≤18 years old. The 5 most common major diagnostic categories in the sample were digestive system (14%, 34/236); lymphatic, hematopoietic, and other malignancies (12%, 29/236); respiratory system (11%, 25/236); nervous system (10%, 23/236); and musculoskeletal system and connective tissue (9%, 21/236). Fifty (21%) in the sample had some degree of communication impairment according to nurse reports.


Characteristics and Communication Impairment of 236 Inpatients Aged ≥12 mo

In the randomly selected development sample (n = 118), the classification model correctly identified 20 of 27 communication-impaired patients (AUC 0.89; sensitivity 74.1%; 95% CI, 68%–97%) and 80 of 91 patients who were not communication impaired (specificity 87.9%; 95% CI, 76%–96%; positive predictive value 64.5%,;95% CI, 50%–83%; negative predictive value 92.0%; 95% CI, 90%–99%; positive likelihood ratio 6.1; 95% CI, 3.3–20.2; negative likelihood ratio, 0.3; 95% CI, 0.0–0.4).

The same classification model was used with the randomly selected validation sample (n = 118) and correctly identified 19 of 23 communication-impaired patients (AUC 0.92; sensitivity 82.6%; 95% CI, 74%–100%) and 82 of 95 patients who were not communication impaired (specificity 86.3%; 95% CI, 80%–97%; positive predictive value 59.4%; 95% CI, 50%–86%; negative predictive value 95.4%; 95% CI, 93%–100%; positive likelihood ratio 6.0; 95% CI, 4.2–28.8; negative likelihood ratio, 0.2; 95% CI, 0.0–0.3).

For the full sample of 236 patients, we used the same PHIS data–based classification model to calculate the predicted probability of communication impairment (which could range from 0 to 1). Odds ratios, P values, and 95% CIs for this logistic regression model are shown in Appendix 2. For each patient, plotting this prediction (positioned along the horizontal axis of Fig 1) with that patient’s observed communication impairment status based on the nurse reports (which were either 0 or 1 and positioned along the vertical axis of Fig 1) showed excellent calibration between the predicted probability and the observed impairment status, with the fitted line across all patients rising steadily from a low value for those predicted to have a low probability of impairment to a high value for those predicted to have a high probability of impairment (Fig 1).


Predicted probability (which could range from 0 to 1) of communication impairment according to Communication Impaired Pediatric Patients classification model (horizontal axis) by the observed communication impairment status (vertical axis, present or none) from nurse reports for 236 patients aged ≥12 months.


Our classification model accurately identified communication-impaired pediatric inpatients in a large hospital administrative database and produced well-calibrated estimates of the probability of communication impairment, including patients with low and high probability of communication impairment. Specifically, the probabilities calculated by the model (as shown in the figure) are bimodal (mostly either very low or very high probabilities) and are well calibrated (ie, the group of patients whom the model calculated as having midrange probabilities of communication impairment were observed as a group to have an equal chance of having or not having communication impairment, whereas most of the low-probability patients did not have communication impairment and most of the high-probability patients did).

The high AUC indicates that the model overall showed excellent discrimination of cases between communication-impaired and non–communication-impaired patients, and the model can be used to explore differences between these 2 groups of patients in large data sets. The model also showed good sensitivity, specificity, negative predictive value, positive likelihood ratio, and negative likelihood ratio. The slightly lower value for sensitivity and the lower value for positive likelihood ratio indicate that the model may be more effective in identifying patients who are not communication impaired than identifying patients who are communication impaired. In no way should this model be used for diagnosing the communication ability of individual patients for clinical purposes.

Although the classification model performed very well, this study has ≥4 limitations that must be considered. First, these findings are based on a small number of patients from 1 institution, and the accuracy of the model may not generalize to other institutions. Second, the gold standard of patient communication ability was an assessment by bedside nurses. Although bedside nurses play a vital role in pain assessment and management for hospitalized patients, a communication assessment by parents or an independent assessor might have yielded different results. Third, we used ICD-9-CM diagnostic and procedure codes from PHIS that have not been validated (although problems arising from invalid codes probably would have eroded the accuracy of the classification model). Fourth, we did not convene a group of experts and use consensus-based methods to develop the classification model (although whether doing so would improve the already excellent discrimination characteristics of the model remains to be seen).

With these caveats in mind, how can the results of this study be put to use? Previous studies have found that pain is often undertreated among the general population of pediatric patients,3 among patients with cognitive impairment,12,13 and among patients whose parents do not speak English.15 Importantly, few data currently exist regarding pain and symptom management among the broader category of pediatric patients who are unable to communicate effectively. To ensure that the needs of these vulnerable patients are met, research should proceed along several tracks. One track would include primary data collection to determine the quality of pain management, and another would include potential interventions to improve care if deficiencies are found. This study represents a step down a third track, namely the study of pain management practices in large data sets. The calculated probability of communication impairment, based on data elements captured in clinically detailed hospital administrative data sets such as PHIS, can be used to determine whether there are disparities in pain management (or the treatment of other symptoms such as nausea or constipation) for patients who are likely to be communication impaired. For example, do patients who undergo the same procedure (eg, an appendectomy) receive different forms of pain management if they have a high probability of being communication impaired? Furthermore, examining whether such differences exist to the same degree across hospitals (which is to say, study variation in practice between hospitals) could inform additional studies to identify best practices. Our ultimate hope is that these “big data” analyses could guide research to improve pain and symptom management for communication-impaired pediatric patients and thereby improve outcomes for this vulnerable patient population.


Diagnoses, Procedures, Tests, and Medications Relevant to Communication Impairment


Logistic Regression Model for Probability of Communication Impairment


  • FINANCIAL DISCLOSURE: The authors have indicated they have no financial relationships relevant to this article to disclose.

  • FUNDING: This study was funded by the Mayday Fund and the Milbank Foundation and in part by the Agency for Healthcare Quality and Research, Comparative Effectiveness and Safety of Hospital-Based Pediatric Palliative Care (grant 1R01HS018425). The Mayday Fund, the Milbank Foundation, and the Agency for Healthcare Quality and Research had no role in the drafting, editing, review, or approval of this manuscript.

  • POTENTIAL CONFLICT OF INTEREST: The authors have indicated they have no potential conflicts of interest to disclose.

area under the receiver operating characteristic curve
confidence interval
International Classification of Diseases, Ninth Revision, Clinical Modification
Pediatric Health Information System