BACKGROUND AND OBJECTIVES: Remote assessment of respiratory distress using telemedicine enabled audio-video conferencing (TM) is of value for medical decision-making. Our goal was to evaluate the interobserver reliability (IOR) of TM compared with face-to-face (FTF) assessment of respiratory distress in children.
METHODS: A prospective, cohort study was performed in pediatric emergency department from July 2012 to February 2013. Children (aged 0–18 years) who presented with signs of respiratory distress were included in the study. The respiratory score is a 4-item, 12-point scale (respiratory rate [1–3], retractions [0–3], dyspnea [0–3], and wheezing [0–3]) that assesses the severity of a child’s respiratory distress. Each child was evaluated by a pair of observers from a pool of 25 observers. The first observer evaluated the patient FTF, and the second observer simultaneously and independently evaluated remotely via TM. The overall respiratory distress severity is based on the respiratory scale and reported as nonsevere (≤8) and severe (≥9) respiratory distress. The IOR reliability between FTF and TM assessment was measured using a 2-way mixed model, absolute agreement and average measure intraclass correlation coefficient (ICC).
RESULTS: Forty-eight patients and 135 paired observations were recorded. IOR between the FTF and TM groups for total respiratory score had an ICC of 0.95 (confidence interval 0.93–0.96) and for subscores, the ICC range was as follows: respiratory rate = 0.92, retractions = 0.85, dyspnea = 0.94, and wheezing = 0.77.
CONCLUSIONS: TM is a reliable tool to assess the severity of respiratory distress in children.
In children, respiratory illnesses (such as asthma, bronchiolitis, and pneumonia) are a common reason for emergency department visits and hospital admissions, including transfer to tertiary care centers.1–3 Lack of resources and onsite subspecialty support in community hospitals are a frequent reason for transfer of children to tertiary centers. Assessment of clinical signs using telemedicine (TM) from a remote location has recently transformed pediatric patient care.4
Reliable interpretation of physical signs in pediatric patients is important for illness management and transfer decision. Several studies in children comparing TM with traditional face-to-face (FTF) evaluation of patients have shown that TM is an effective tool in clinical evaluation of acute minor illnesses,5 wound and minor injury assessment,6–8 and burn assessment.9 These studies have used simple video consultation for children with noncritical illnesses. In contrast, respiratory status evaluation is complex, involving auscultation of chest and interaction with the child. TM-enabled high-definition audio-video conferencing allows interactive discussion and relays clinical findings (physical signs). The addition of peripheral devices, such as an electronic stethoscope, allows a comprehensive clinical evaluation that entails auscultation of the chest for breath and heart sounds.4,10 A recently published study by Yager et al showed that TM is a useful tool in evaluating difficult or complex physical signs pertaining to circulatory and neurologic status in children.11 This study did not evaluate children with respiratory illnesses.
There is little evidence to support the precision and reliability of remote assessment of respiratory distress in children using TM. Interobserver reliability (IOR) is a measure used to evaluate agreement between or among observers. The purpose of this study is to evaluate the IOR of TM compared with FTF assessment of respiratory distress in children.
We conducted an observational, prospective cohort study using a convenience sample of 48 children (aged 0–18 years) who presented to a pediatric emergency department (PED) with respiratory distress from July 2012 to February 2013. Our PED is a regional tertiary care center. Multiple care providers specifically trained in children’s care, including physicians (MD), nurses (RN) and respiratory therapists (RTs) evaluate the respiratory status of pediatric patients.
Clinically stable children with varying degrees of acute respiratory distress who agreed to participate were included. Children who were acutely ill were deferred until they were stabilized. No further follow-up on disposition or treatment decisions were made for study purposes.
The study was approved by the university-affiliated school of medicine institutional review board. All eligible patients and parents seen in the PED were approached, and only those who signed informed consent were included. The enrollment process was 24 hours a day, 7 days a week, and it was dependent on the availability of the providers and study investigators. A few patients missed the enrollment process because they were discharged from PED before enrollment, and some were excluded from the study because the physical signs disappeared with initial treatment.
The study investigators were health care providers (PED nurses or physicians) or authors who were trained and familiarized with TM equipment and study procedures. Two investigators participated for every patient enrollment. One investigator was at the patient’s bedside (FTF) and the second one at the TM site. The first investigator’s role was to identify a patient, obtain consent from the patient or parent, help them log in to the mobile video unit, initiate the TM call, educate the observers on how to use the respiratory score sheet, and communicate with second investigator (TM site). The first investigator facilitated the examination process including auscultation of the chest. Additionally, the second investigator at the TM site assisted the observers with the technical aspects of TM call.
Data were collected on demographics and respiratory score. The research investigator identified patients seen in the PED with the signs and symptoms of respiratory distress. The diagnoses from the emergency physician’s medical record included acute exacerbation of asthma and/or status asthmaticus, reactive airway disease, bronchiolitis, pneumonia, or combination as depicted in Table 1. The data on clinical outcomes of the patient were not collected in this study. The patients were divided by age into 3 groups as shown in the Table 1. Each patient’s respiratory status was evaluated by a pair of observers using a Liu’s respiratory score (Table 2).12 The first observer evaluated the patient using the standard FTF approach, and the second simultaneously and independently evaluated the patient from a remote location using TM. Observers were instructed not to discuss their findings or speak loudly. Observers were approached, depending on their availability during shifts from a pool of 25 physicians, nurses, and respiratory therapists in the PED and the PICU. The study observers were randomly assigned to either FTF or TM. All the observers received training before the evaluation, including use of the respiratory score sheet and the basics of equipment, such as the headphones and stethoscope. Before each observation, the connection was established from the mobile unit to the remote desktop unit. When potential participants were satisfied with the quality of the picture, the evaluation continued. Respiratory rate was counted for a full minute by the FTF and TM observers simultaneously. Auscultation for wheezing and breath sounds using an electronic stethoscope was performed on the patient’s chest bilaterally and equally. During the study, we strove to match observer pairs of similar provider type (eg, MD-MD, RN-RN, RT-RT). There were 10 paired observations with mixed provider type (eg, RN-RT, MD-RN).
There are several validated and reliable respiratory scoring methods used in clinical practice for assessment of respiratory distress in children with asthma or bronchiolitis.12–15 We selected the items in our respiratory score from the previous study by Liu et al12 because the score items are common signs of respiratory distress that are easy to use and age-specific. We used an arbitrary cutoff for defining the severity of respiratory distress into nonsevere (≤8) and severe (≥9) respiratory score.
The equipment used for the study consisted of a mobile video cart (Polycom Practitioner Cart with Polycom HDX 8000 Rubbermaid Healthcare, Huntersville, NC) placed in the PED and a desktop video unit (Cisco EX 90 Cisco Systems, San Jose, CA; Fig 1) located in a remote location within the pediatric department. Communication between these units used the hospital’s high-speed, high-bandwidth (>1 Mbps) wireless Internet connection. The mobile cart had a high-definition camera capturing video of the patient that could be viewed in real time from a remote location on a desktop monitor. The observer has an ability to focus and control the direction of the camera on the mobile cart from the remote location specifically to discern the area of interest, such as observing the suprasternal or intercostal retractions. Simultaneous audio connection with microphone and speakers or headphones in both units facilitated interactive communication between the 2 care providers. The electronic stethoscope, a peripheral device with headphones, is attached to the mobile video unit (GlobalMed Caretone Telephonic Stethoscope and Transmitter with Internet Protocol (IP) adapter and netlink 4-port switch Rubbermaid Healthcare, Huntersville, NC; Fig 1). The wireless mobile cart was placed on the hospital’s wireless clinical network using a Cisco Aironet (Cisco Systems, San Jose, CA) access point. A video call established between the 2 end points used IP communication. The audio signals from electronic stethoscope were segmented to prevent interference from the audio-video signals of TM conversation.
The desktop unit situated in the remote location had a peripheral unit (GlobalMed Caretone Receiver with IP adapter and headphones, Globalmed, Scottsdale, AZ) enabling the observer at the remote location to identify or hear the auscultation findings from the patient. When the observer at the mobile unit (FTF) applied the electronic stethoscope to the patient’s chest, the same auscultation sounds could be heard by FTF observer (headphones) and by the TM observer (remote desktop unit) using the headphones attached to the unit simultaneously. However, the sound intensity (or volume) could be adjusted separately at their respective sites (FTF or TM) to their comfort level.
During the study period, 2 parents who refused to participate in the study were excluded. Four minor events occurred related to connectivity issues and impaired picture quality (video froze). On 3 occasions, the issues were resolved by rebooting the computer. On the fourth occasion, the observation was paused and later resumed when the problem resolved spontaneously. In all those events, once the connection was reestablished, the quality of video was excellent. There were 3 other events related to audio from the electronic stethoscope. In those 3 events, the observers perceived the sound as loud and the ambient noises as coarse and difficult to differentiate from adventitious sounds such as rhonchi. The issues were resolved by turning the volume knob to the observers’ comfort level. During the study, 3 young children were agitated and uncooperative. The observations were delayed by a few minutes. On 2 occasions, the video helped to keep the children engaged with the monitor when the care provider was talking to them from the remote location. Despite a few minor issues, overall, care providers perceived TM-enabled video and audio quality as excellent.
The statistical analysis for the IOR, we used intraclass correlation coefficient (ICC), a 2-way mixed model, and average ratings with absolute agreement between the raters (observers). In this study, ICC (3, 2) is the preferred reliability coefficient because it reflects both degree of correspondence and agreement among ratings. The 2-way mixed model ICC is suitable for our study because raters are seen as a fixed effect, and ratings/subjects are a random effect. We used “absolute agreement” ICC to lessen the systematic variability due to raters. The “average measure” reliability gives the reliability of the mean of the ratings of all raters.
Magnitude of ICC
In line with other reliability coefficients, as a general guideline for the cutoffs to grade the strength of agreement, we used values >0.75 as indicative of good reliability and those <0.75 poor as indicative of moderate reliability. For intermediate values, we used the following interpretations: <0.25 = poor, 0.25–0.50 fair, 0.50–0.75 = moderate, 0.75–0.90 = good, and >0.90 excellent.16
Sample Size Justification
Our sample size calculation was built on the hypothesis testing of Walter et al,17 which requires a desired power level, a magnitude of the predicted ICC, and a lower confidence limit. We set the optimal sample size for ICC on the basis of the desired ≥0.80 power level, the predicted ICC magnitude at ≥0.90, and the confidence level within 95%. Applying our predicted values and 2 ratings per subject, the needed sample size ranged from 5 when the estimated ICC was 0.9 and 616 when the estimated ICC was 0.1, and vice versa. We anticipated collecting ∼110 ratings over a study period and accrued 135 ratings. However, fewer ratings were required to determine the predicted/desired effect.
We have calculated sensitivity and specificity for severity of respiratory distress as a binary value (nonsevere versus severe) observed between FTF and TM. The cutoff score was set to 9 to differentiate nonsevere from severe distress (nonsevere ≤8 and severe as ≥9). The FTF evaluation was taken as the gold standard reference point for evaluation. A 95% confidence interval (CI) was calculated on all samples.18 A receiver operating characteristic (ROC) analyses was done, with rating (FTF) and discrete classification (TM) data to estimate the sensitivity and specificity for TM-assisted evaluation for detecting the severity of respiratory distress. All analyses were performed using statistical package SPSS version 20 (IBM, Armonk, NY).19
Forty-eight patients aged 0 to 18 years were examined using FTF and TM equipment. The age of our youngest patient was 8 months and the oldest was 18 years. There were a total 135 paired observations obtained from a pool of 25 care providers. The demographic data of all study participants are summarized in Table 1. The most common diagnosis was acute exacerbation of asthma and/or status asthmaticus (79.1%).
Table 3 depicts the ICC values with 95% CIs for respiratory scores that include overall score (1–12), respiratory severity score (nonsevere vs severe), and scores for individual clinical parameters such as respiratory rate (RR; 1–3), retractions (0–3), dyspnea (0–3), and wheezing (0–3).
There is excellent agreement between FTF and TM for overall respiratory score (ICC = 0.95). The IOR in respiratory parameters including RR (ICC = 0.92), dyspnea (ICC = 0.94) were excellent. The IOR for retractions (ICC = 0.85) and wheezing (ICC = 0.77) were good. Good agreement was also noted in differentiating severity of respiratory distress from nonsevere versus severe (ICC = 0.80).
We calculated the sensitivity and specificity for TM assisted evaluation for detecting the severity of respiratory distress using an ROC analysis. The ROC area under the curve was 0.84 (95% CI 0.77–0.91). The sensitivity = 83.3% (ie, TM examinations correctly diagnosed 45 of 54 patient ratings with “severe” respiratory distress) and specificity = 84% (ie, TM examinations correctly diagnosed 68 of 81 patient ratings with “nonsevere” respiratory distress) for TM-assisted evaluation when the respiratory score cutoff used ≤8 (nonsevere ≤8 and severe ≥9) with the κ 0.67.
In our study, we found a good to excellent range of IOR (ICC = ≥0.75) in nearly all elements of respiratory scores (RR, retractions, dyspnea, wheezing, overall respiratory score, and severity of respiratory distress), indicating that observers had a high degree of agreement between FTF and TM group. The IOR in our study was measured using a 2-way mixed average measure ICC to assess the degree that observers provided absolute agreement and consistency in their observations of respiratory distress across subjects. The high ICC suggests that a minimal amount of measurement error was introduced by the independent observers, and therefore statistical power for subsequent analyses is not substantially reduced. The reliable, remote evaluation of clinical signs of respiratory distress is important to care providers who care for hospitalized children for 2 reasons: first, the respiratory illnesses are the most common reason for emergency department visits in children including transfers to tertiary care centers1–3; second, a child with respiratory distress requires comprehensive evaluation that many nonpediatric physicians might not be comfortable conducting. Our study is the first to examine children presented to the ED with varying degrees of acute respiratory distress using TM.
We found a high level of IOR for individual components of respiratory distress. The agreement for RR obtained via TM was excellent (ICC = 0.92) compared with FTF examination. The RR in the assessment scale is measured objectively and thus subject to less variation. The RR in children varies minute by minute and, if measured successively, may result in variation or poor agreement. In the study by Liu et al, the agreement for RR was poor between the 2 observers because the RR was counted by 2 observers successively.12 In our study, the RR was measured simultaneously by the 2 observers, and as a result, agreement was high. Dyspnea is another respiratory parameter that had excellent IOR (ICC = 0.94) between FTF and TM observations. Dyspnea is a subjective measure that accounts for feeding, vocalization, and activity, and it may be a better reflection of patient’s overall work of breathing and ability to cope. IOR is best assessed when the observers evaluate a subject simultaneously and independently. This eliminates the true difference in scores as a source of measurement error when comparing observers’ scores.
The IOR between FTF and TM group is slightly lower for wheezing (ICC = 0.77) and retractions (ICC = 0.85) compared with other subscores (RR and dyspnea) in our study. Wheezing is the respiratory component evaluated by using an electronic stethoscope and has a lower agreement compared with all other components of respiratory score. This may be because of difficulty experienced in defining breath-sound intensity by the observers in our study. It is also possible that extraneous or ambient noises from the use of electronic stethoscope masked wheezing for some observers. In nearly all cases, the ambient noise was reduced by adjusting the volume knob to their comfort level, indicating a need to explore new TM training on the use of the electronic stethoscope. A few validation studies have shown that interobserver variability exists even during simultaneous assessments by equally skilled providers.15,20–22
In our study, the IOR for overall respiratory distress score of 1 to 12 (a single variable reported as a continuum) was in excellent agreement (ICC = 0.95) between groups (FTF and TM). The respiratory distress severity variable reported as a binary value (nonsevere versus severe) has a good IOR (ICC = 0.80). This is similar to the study by Liu et al,12 who found good agreement for all observer pairs (84% with a weighted κ of 0.62), indicating that our IORs for FTF versus TM assessment are noninferior compared with 2 FTF assessments (ie, variations in our observer scores are at least partially due to normal variation among medical staff rather than the modality [FTF or TM] used for assessment). Another reason for high respiratory scores (ICC = 0.80–0.95) may be that our PED providers are especially familiar with TM equipment because of their involvement in other projects. The learning curve was simple for observers because the study coordinators, who were trained and familiarized with TM equipment and study protocols and procedures, consistently facilitated the study enrollment. The TM may be used in many ways, but its purpose in this study is to reliably identify children in severe respiratory distress from a remote location, and it may assist in management decisions. TM evaluations in this regard may influence the decision on disposition by predicting a need for transfer and may even prevent unnecessary transfer. TM-based assessment of respiratory status could also be used by subspecialists doing inpatient TM consults at remote hospitals.
Several limitations should be considered when interpreting these results. First, the study was performed in a tertiary care center with care providers specifically trained in the care of children. Therefore, the findings may not be generalizable to other community hospitals or clinic settings. Second, it is a convenience sample. The enrollment of study subjects was selective with higher proportion of patients who had severe respiratory distress and/or availability of the observers. This could be subject to potential bias toward less sick patients if the selection was not at random. Third, the respiratory scale primarily used in this study was to evaluate the reliability of TM-enabled evaluation for children with respiratory distress rather than to measure the clinical outcome of the patient, disposition, or the effects of treatment on the outcome. Fourth, despite providing specific instructions to the observers not to discuss their findings, there may be potential for sharing of the content or their findings among other participants, especially among the TM group members, who were in an isolated room. We monitored the observers during the study to minimize interaction or sharing of information. Finally, many programs use different equipment and software (stethoscope, camera, etc) from those used in this study. Our IOR for auscultation and other findings might not be generalized to hospitals using different software or peripherals.
It is essential to deliver high-quality audio and high definition video to enable accurate remote assessment and visualization of physical signs of respiratory distress. The most important challenge was the observer understanding of the TM equipment. There can also be a protracted learning curve for participants. The other minor challenges include technical issues such as connectivity (slow connection, dropped calls) and audio or video quality. Most of these challenges can be overcome by providing a dedicated broadband cable connection and appropriate training to the care providers. Dedicated connections can be a component of infrastructure expansion and will be cost prohibitive for many institutions. The remote location of the TM observer also presents challenges. The TM observer cannot control a patient’s behavior or activities, and so an observer at the bedside is needed to assist when the patient is uncooperative. Interestingly, some young children were excited or calmed by watching the care provider’s face on the monitor.
There is a good to excellent IOR among care providers in the clinical evaluation of respiratory distress in children using TM compared with traditional FTF evaluation. Our results suggest that TM may be an effective and reliable tool for the remote assessment of respiratory distress in children.
We thank John Kornak (Telehealth Director) and Andre Burton (Telehealth Engineer), University of Maryland Medical Center, for providing technical assistance and negotiated the vendors for pilot testing of equipment.
FINANCIAL DISCLOSURE: The authors have indicated they have no financial relationships relevant to this article to disclose.
FUNDING: No external funding.
POTENTIAL CONFLICT OF INTEREST: The authors have indicated they have no potential conflicts of interest to disclose.
This work was presented as an abstract at the Pediatric Research Conference.
This work was presented in part at the Pediatric Academic Societies and Asian Society for Pediatric Research Joint Meeting; May 3-6, 2014; Vancouver, Canada.
- Merrill CT,
- Owens PL,
- Stocks C
- Merrill C,
- Owens PL
- Van Dillen C,
- Silvestri S,
- Haney M,
- et al
- Benger JR,
- Noble SM,
- Coast J,
- Kendall JM
- Smith AC,
- Kimble R,
- Mill J,
- Bailey D,
- O’Rourke P,
- Wootton R
- Belmont JM,
- Mattioli LF
- Reichenheim ME
- 19.↵Statistical Software Package SPSS [computer program]. Version 20. Armonk, NY: IBM Corp; 2010
- Marin JR,
- Bilker W,
- Lautenbach E,
- Alpern ER
- Copyright © 2016 by the American Academy of Pediatrics