October 2019, VOLUME9 /ISSUE 10

Bringing Scientific Rigor to Survey Design in Health Care Research

  1. Sonja I. Ziniel, PhD, MAa,
  2. Corrie E. McDaniel, DOb and
  3. Jimmy Beck, MD, MEdb
  1. aSection of Pediatric Hospital Medicine, Department of Pediatrics, Children’s Hospital Colorado and School of Medicine, University of Colorado, Aurora, Colorado; and
  2. bSeattle Children’s Hospital, University of Washington, Seattle, Washington
  1. Address correspondence to Sonja I. Ziniel, PhD, MA, Department of Pediatrics, Pediatric Hospital Medicine, University of Colorado School of Medicine, 13123 East 16th Ave, 302, Aurora, CO 80045. E-mail: sonja.ziniel{at}childrenscolorado.org
  1. Drs Ziniel, McDaniel, and Beck drafted and reviewed the initial manuscript and approved the final manuscript as submitted.

The use of surveys in health care research has become increasingly popular. Often seen as an inexpensive, quick, and simple way to address a research question, surveys are especially attractive to researchers in training or during early career stages when research funding is limited and the time to complete a research project is limited. Yet, when used as research instruments, surveys should be developed with the same scientific rigor as other measurement instruments used to collect data in health care research.

Traditional research methodologies are known for rigorous protocols, including exact specifications for reproducibility and often the presence of control groups to allow for comparisons between cohorts. Concepts such as reliability and validity are not only relevant for measurement instruments of “hard” data, such as thermometers or mass spectrometers, but also for measurement instruments of “soft” data, such as surveys. Researchers using surveys need to ensure that each question will result in a valid and reliable measure of what the question was intended to measure. Demonstrating the validity and reliability of survey questions requires great effort, and although a thorough psychometric validation is not always warranted, every survey should undergo a methodologically reproducible process through development, selection of potential respondents, testing, and administration so that survey results are reproducible and generalizable.

Despite numerous reference books,14 guidelines, and best practices on survey design,59 surveys are often constructed, administered, and submitted for publication without previous consultation of these references and their recommendations. In this article, we highlight key concepts for survey development, focusing on the critical elements of designing and implementing a survey to yield meaningful and publishable results.

Key Concepts of Survey Development

To Survey or Not to Survey? That Is the Question

Understanding when to use a survey is the first step in solid survey design. In considering the use of a survey, one ought to ask if the research question can more appropriately be answered by another method.

Surveys are particularly useful when trying to measure concepts that are abstract ideas, such as people’s beliefs, attitudes, or opinions. A well-designed survey can capture and quantify measures that otherwise are challenging to observe. For example, this may include concepts such as practitioner beliefs surrounding end-of-life care for medically complex patients or providers’ levels of burnout. Appropriately applying this principle, Srivastava et al10 distributed a survey to pediatric department chairs to clarify the definition of a hospitalist and to assess the practice patterns and views on training needs for pediatric hospitalists in academic centers.

Often, however, surveys are used to measure constructs such as clinical outcomes, tasks of high cognitive load, and observable behaviors. These data usually are more appropriately obtained through other research methods. For instance, although one could design a question asking people how many times per care encounter that they wash their hands, a stronger approach would be to observe and quantify the actual behavior. Similarly, some inquiries are too challenging to attempt to answer via a survey question. Inquiries such as, “How many hours did you use the Internet last month?” become unanswerable questions, leading to guesses rather than accurate responses.

When considering if a research question is appropriate to answer via a survey, it is important to remember that surveys involve memory recall and to consider how respondents answer survey questions.11 Events are stored in human memory in an autobiographical way, meaning that they are usually, linked temporally to personal experiences, such as family- or job-related events, rather than to objective calendar dates. For example, when asked how often someone has been hospitalized in the past 12 months, respondents typically have to reconstruct their life events over the past year. Furthermore, memories of these events are altered each time they are retrieved, with specific details replaced by more generic information as time passes. The more details a respondent has to retrieve from memory when asked through a survey question, the more cognitively demanding it is to determine an answer. This often makes it more likely that the response given will be based on an estimate or a guess.

Identifying a Sampling Frame and Gauging the Generalizability of the Data

Choosing the correct sampling frame is critical in establishing results with generalizability to a target population.2 Although it is usually impossible to survey an entire population, it is also burdensome because of time and money constraints to gather data from every individual of a target population. As such, researchers often choose a smaller group of individuals intended to represent the population being studied. To adequately represent a larger population, the sampling frame must reflect key characteristics of the target population. A poorly chosen sampling frame will introduce significant bias into the resulting data.

For example, within pediatric hospital medicine, the American Academy of Pediatrics Section on Hospital Medicine Listserv functions as a robust cohort of pediatric hospitalists representing the single largest listing of pediatric hospitalists, currently with over 3800 subscribers (KT Powell, MD, PhD, FAAP, personal communication, 2019). Although the listserv at first glance appears to be a prime sampling frame for capturing pediatric hospitalists, its use is riddled with complications that make it less than the ideal sampling frame. First, the listserv does not collect demographic data on subscribers, making it impossible to determine if survey respondents are truly representative of pediatric hospitalists in the United States. Percentages of pediatric hospital physicians, midlevel providers, subspecialty providers, and trainees among the listserv subscribers are unknown. With variability among practice settings,12,13 small samples taken from respondents on the listserv are unlikely to represent a generalizable population. In addition, because the listserv is a dynamic mailing list, subscribers may have more than 1 registered e-mail address, and the listserv may include e-mail addresses that are rarely checked.14 This makes calculating an accurate response rate difficult.

Another common approach to creating a sampling frame is to collaborate with colleagues, with each serving as a site lead at their given institution. This is preferable to using the listserv because it allows for the identification of unique eligible respondents and the calculation of a final denominator for a response rate. A major limitation, however, is the generalizability of a convenience sample of a few institutions to a larger population in regard to the characteristics measured in the survey.

Ultimately, choosing a sampling frame is a balance of feasibility with generalizability. When designing a survey around provider perspectives on high-flow nasal cannula use, for example, surveying pediatric hospitalists from every hospitalist program across the country would provide a known number and capture differences in region, site practices, local customs, and resource variability, but it would also be expensive, time-consuming, and impractical. Instead, as illustrated by Leyenaar et al15 in using a stratified sample of hospitals within the United States to characterize the direct admission practices across diverse hospitals, a stratified multisite sample specifically targeting factors that may influence these perspectives (such as resource variability, location within the country, hospital type, and size of hospitalist group) may provide a sampling frame that balances feasibility with a generalizable sample.

Next, understanding response rates and nonresponse bias is critical for the interpretation of survey data.16 Accurate classification of sample members as respondents, nonrespondents, and ineligibles, according to the guidelines of the American Association for Public Opinion Research, allows for correct calculation of response rates and, ultimately, comparability across studies.17 Although low response rates are often interpreted as indicators of surveys with nonresponse bias, neither low nor high response rates are equivalent to nonresponse bias.2

Nonresponse bias occurs if respondents differ from nonrespondents with regard to a specific survey question with low response rates, increasing the likelihood of nonresponse bias occurring in a statistic of interest, such as a mean, that is based on that survey question. For example, when designing a Web-based survey to determine the prevalence of nocturnists within hospital medicine, if the title of the survey is, “A Survey of Pediatric Hospital Medicine Division Chiefs Regarding Nocturnists,” the title itself may predispose division chiefs without nocturnists to disregard the survey. As such, resulting data collected would disproportionately reflect programs with nocturnists, thus overestimating the prevalence, an example of nonresponse bias.

The assessment of the existence of nonresponse bias is usually difficult because the characteristics of nonrespondents are often unknown when compared with survey respondents. To mitigate this, a well-defined sampling frame with known characteristics allows for statistical comparisons of respondents and nonrespondents after data collection and some approximate assessment of nonresponse bias.16 An example for the evaluation of nonresponse bias within the statistical analyses can be found in the study described by Joyce et al18 through the creation of sample weighting based on known demographic factors and subsequently accounted for within regression modeling. A study with a well-defined sampling frame with described characteristics may still have generalizable results even with a low response rate if researchers can demonstrate that (1) the survey respondents adequately reflect the sampling frame, and (2) the sampling frame reflects the target population, with regard to characteristics that are likely to influence the survey outcomes. Additional efforts should still be made to increase response rates as much as possible, because increasing the overall size of the final data set improves the power of statistical analyses.

Lastly, Web-based surveys (using survey software such as REDCap,19 Survey Monkey,20 Qualtrics)21 have gained significant popularity over the past few decades because of their ease of use, availability, and streamlined data collection. Careful planning around design and administration is required to yield adequate Web survey response rates, and details related to survey administration should be determined before the survey is distributed. Potential options to maximize respondent participation when using a Web-based survey include personalizing e-mail contact to the respondent, using multiple reminders, tailoring the invitation and reminders to emphasize the importance of the survey to each participant, and providing incentives, as appropriate.1

Survey Design Principles

Once the decision to use survey methodology to answer a research question has been made and the sample frame identified, several straightforward survey design principles can guide development while keeping the end goal in mind: the research question to be answered and the constructs to be measured. Gehlbach et al5 summarize a rigorous and solid survey design process developed since the 1940s by survey researchers into a 7-step approach: conducting a literature review, conducting interviews and/or focus groups, synthesizing the results from the literature review and interviews and/or focus groups, developing questions, establishing content validity through expert interviews, conducting cognitive interviews, and conducting pilot testing.3,6

The initial steps help ground the development of a survey. First, starting with a comprehensive literature review ensures that the identified survey topic is relevant within a given field (step 1). By nature of curiosity and through the formation of any research question, researchers often become attached to ideas. Anchoring the research question within the available literature helps to hone a research question. A comprehensive literature review may also uncover already published survey instruments and provide insight into the quality and applicability of developed tools. Publication of a survey does not inherently make the survey a good tool or a tool with valid results. Surveys are often idiosyncratic, in that they produce valid results when used in the same target population as the one the tool was originally developed for. For example, the MCAT is accepted as producing valid results for predicting premedical student success within medical school. However, if one asked graduating fellows in nephrology to take the MCAT, the results would not be predictive of future career success. As such, careful and thoughtful evaluation of a tool before use in a new population is critical. However, if a tool does exist with established validity evidence for the same sample population, using a previously developed tool or scale is preferable to developing one de novo and will strengthen the survey results.

Next, preliminary discussions or even interviews, if warranted, with the target population will help to ensure that the research question is a topic with relevance and provides a local context in which to appropriately situate the survey questions (step 2). Involving topic experts can provide insight as the research team compiles and synthesizes the literature along with results from interviews or focus groups with the target population to begin formulating a preliminary instrument (step 3).

Step 4 launches into writing survey questions. It is at this stage that early involvement of an expert with survey design expertise and/or psychometric experience can help direct question development, avoiding common pitfalls and planning survey design so that the data obtained reflect the aims of the study. Although a comprehensive review of writing good survey questions is beyond the scope of this article, in Table 1 we present a few common mistakes to avoid when writing survey questions, including double-barreled questions, negatively worded items, using statements instead of questions, response scales that do not match the question, asking socially desirable or anxiety-provoking questions, and choosing answer categories that are not mutually exclusive.


Key Mistakes to Avoid When Constructing Survey Questions

Lastly, when writing survey questions, it is often tempting to include free text responses with the intention of qualitatively analyzing respondents’ answers. Given the limited space provided in most surveys, respondents rarely provide rich responses, reducing the quantity and quality of substantial contributions.22 As such, free text responses rarely meet the standard for excellent qualitative research.23 Although including these responses to perform supplementary analysis to the primary survey research is acceptable, unless rigorous qualitative methods are applied in the design and distribution of the survey, one ought to avoid describing the survey as “mixed methods.”24 These supplementary analyses, however, involve coding of all open-ended responses, and it is usually time intensive to yield meaningful analytical results in comparison with using closed-ended responses.

Testing the Survey

Commonly, steps 1 through 4 of the Gehlbach et al5 framework are routinely attempted, albeit rarely specifically delineated within articles. Stopping after step 4 with item development and distributing the survey at this point, however, hinders the establishment of a high-quality survey instrument and limits the conclusions that can be drawn from collected data.

After the initial survey instrument is constructed, researchers should reengage content topic experts and survey design experts to establish content validity (step 5). Content validity is the degree to which the items of the survey instruments reflect the construct being measured.25 Items should not only pass the “sniff test,” but the research team should ask the experts to evaluate 4 main areas: individual question clarity, topic relevance, language level, and missing content.26

Once content validity has been established, the next step in survey design is the testing of questions through cognitive interviews, also called cognitive pretesting (step 6).27 A common form of cognitive interviews involves in-person interviews in which the interviewer asks the respondent to “think aloud” when answering the survey question, allowing the survey designer to identify problematic questions before the survey is conducted.28 Although the respondents might be able to verbalize problems, the interviewer can also ask detailed questions after the respondents answer each question, eliciting more information on how they arrived at the answer (eg, how they would ask the question in their own words or how they would define a certain term). In general, cognitive interviews elucidate issues in question comprehension and sensitivity, memory retrieval, and the interest and motivation of the respondent.

Cognitive interview participants should be recruited from the same population that will receive the final survey, varying the background of the participants with regard to those characteristics that most likely will influence understanding or interpretation of the questions. Obvious problems, like missing response categories, should be corrected immediately before starting another interview, whereas major wording changes should be discussed with the research team and only corrected if more than 1 participant encountered this problem. Any major changes to the survey should be tested through cognitive interviews again.

Similar to qualitative interviews, the number of cognitive interviews that should be conducted is variable. Cognitive interviews can be concluded when no additional major problems are found in the survey. Often, this occurs after a handful of interviews if the survey was carefully constructed. This process should be clearly documented in the article.

The last step before survey administration is piloting the survey (step 7). The goal of piloting is to collect evidence that the data produced by the survey is consistent with the expectations of the research team. Working with a statistician, the researchers should administer the survey to a small subset of the survey sample. The resulting data can then be analyzed to assess if individual items show the desired variance and if the actual survey completion time matches what was expected.

Taking the Next Steps in Designing a Survey

There are many additional aspects to consider when designing a survey that are not touched on in this article, including survey administration, data management, and use of conceptual models or frameworks. Available resources for additional reading into survey design when planning a research study are listed in Table 2.


Additional Resources for Different Aspects of Survey Design


Through choosing appropriate research questions to be answered by a survey, thoughtfully selecting a sampling frame, and using a structured approach to instrument design, including cognitive pretesting and piloting, researchers can design a methodologically rigorous survey with generalizable and reproducible results. Although this process may not be quick or easy, it sets a survey up to collect meaningful data, ultimately increasing potential impact and publishability.


We thank Lilliam Ambroggio, Kevin Messacar, and Jeannie Zuk for valuable comments on earlier drafts of this article.


  • POTENTIAL CONFLICT OF INTEREST: The authors have indicated they have no potential conflicts of interest to disclose.

  • FINANCIAL DISCLOSURE: The authors have indicated they have no financial relationships relevant to this article to disclose.

  • FUNDING: No external funding.