How to Review Internal Validity in Research Report
Systematic review of guidelines for internal validity in the design, carry and analysis of preclinical biomedical experiments involving laboratory animals
Abstract
Over the terminal ii decades, awareness of the negative repercussions of flaws in the planning, bear and reporting of preclinical inquiry involving experimental animals has been growing. Several initiatives have fix out to increment transparency and internal validity of preclinical studies, more often than not publishing adept consensus and experience. While many of the points raised in these various guidelines are identical or similar, they differ in item and rigour. Most of them focus on reporting, only few of them cover the planning and behave of studies. The aim of this systematic review is to identify existing experimental design, acquit, assay and reporting guidelines relating to preclinical brute research. A systematic search in PubMed, Embase and Spider web of Science retrieved 13 863 unique results. Subsequently screening these on title and abstract, 613 papers entered the full-text assessment phase, from which 60 papers were retained. From these, we extracted unique 58 recommendations on the planning, conduct and reporting of preclinical brute studies. Sample size calculations, adequate statistical methods, concealed and randomised allotment of animals to treatment, blinded outcome assessment and recording of fauna menstruation through the experiment were recommended in more than than one-half of the publications. While we consider these recommendations to be valuable, there is a striking lack of experimental evidence on their importance and relative event on experiments and effect sizes.
- scientific rigor
- bias
- internal validity
- preclinical studies
- animal studies
https://creativecommons.org/licenses/by/4.0/
This is an open access article distributed in accordance with the Artistic Commons Attribution 4.0 Unported (CC BY iv.0) license, which permits others to copy, redistribute, remix, transform and build upon this piece of work for any purpose, provided the original work is properly cited, a link to the licence is given, and indication of whether changes were made. See: https://creativecommons.org/licenses/past/4.0/.
Statistics from Altmetric.com
- scientific rigor
- bias
- internal validity
- preclinical studies
- animate being studies
Introduction
In contempo years, there has been growing sensation of the negative repercussions of shortcomings in the planning, conduct and reporting of preclinical fauna research.1 2 Several initiatives involving academic groups, publishers and others accept gear up out to increase the internal validity and reliability of master research studies and the resulting publications. Additionally, several experts or groups of experts across the biomedical spectrum have published experience and stance-based guidelines and guidance. While many of the points raised are broadly similar between these various guidelines (probably in part reflecting the ascertainment that many experts in the field are function of more than than one initiative), they differ in detail, rigour and, in particular, whether they are broadly generalisable or specific to a single field. While all these guidelines comprehend the reporting of experiments, only a few specifically address rigorous planning and conduct of studies,3 4 which might increase validity from the earliest possible point.5 Consequently, it is difficult for researchers to choose which guidelines to follow, especially at the stage of planning future studies.
We aimed to place all existing guidelines and reporting standards relating to experimental design, conduct and assay of preclinical animal research. We too sought to identify literature describing (either through primary research or systematic review) the prevalence and impact of perceived risks of bias pertaining to the blueprint, conduct and analysis and reporting of preclinical biomedical research. While we focus on internal validity as influenced by experimental blueprint, conduct and analysis we recognise that factors such equally animal housing and welfare are highly relevant to the reproducibility and generalisability of experimental findings; however, these factors are not considered in this systematic review.
Methods
The protocol for this systematic review has been published in ref vi. The post-obit amendments to the systematic review protocol were made: in addition to the systematic literature search, to capture standards set by funders or organisations that are non (or not yet) published, it was planned to conduct a Google search for guidelines published on the websites of major funders and professional person organisations using the systematic search string beneath.6 This search, however, yielded either no returns, or, in the example of the National Found of Wellness, identified over 193 000 results, which was an unfeasibly large number to screen. Therefore, for practical reasons this part of the search was excluded from the initial search strategy. Reassessing the goals of this review, nosotros decided to focus on internal validity, in the protocol we used the term 'internal validity and reproducibility'. In the protocol, we mention that the aim of this systematic review is an endeavor to harmonise guidelines and create a unified framework. This is still under fashion and will be published separately.
Search strategy
Nosotros systematically searched PubMed, Embase via Ovid and Web of Scientific discipline to identify the guidelines published in English language in peer-reviewed journals before x January 2018 (the mean solar day the search was conducted), using appropriate terms for each database optimised from the post-obit search string (as can be found in the protocol6):
(guideline OR recommendation OR recommendations) AND ('preclinical model' OR 'preclinical models' OR 'disease model' OR 'disease models' OR 'animal model' OR 'brute models' OR 'experimental model' OR 'experimental models' OR 'preclinical study' OR 'preclinical studies' OR 'animal report' OR 'animal studies' OR 'experimental study' OR 'experimental studies').6
Furthermore, as many of the researchers participating in the European Quality in Preclinical Data project (http://eqipd.org/) are experts in the field of experimental standardisation, they were contacted personally to identify additional relevant publications.
Inclusion and exclusion criteria
Nosotros included all articles or systematic reviews in English which described or reviewed guidelines making recommendations intended to amend the validity or reliability (or both) of preclinical fauna studies through optimising their design, comport and analysis. Articles that focused on toxicity studies or veterinarian drug testing were non included. Although reporting standards were non the key primary objective of this systematic review these were also included, every bit they might contain useful relevant information.
Screening and information management
We combined the search results from all sources and identified duplicate search returns and the publication of identical guidelines past the aforementioned writer grouping in several based on the PubMed ID, DOI, and the title, journal and author list. Unique references were then screened in 2 phases: (i) screening for eligibility based on title and abstract, followed past (2) screening for definitive inclusion based on total text. Screening was performed using the Systematic Review Facility (SyRF) platform (http://syrf.org.uk). Ten reviewers contributed to the screening phase; each commendation was presented to two contained reviewers with a existent-fourth dimension computer-generated random selection of the next commendation to be reviewed. Citations remained available for screening until two reviewers agreed that information technology should be included or excluded. If the first two reviewers had disagreed the commendation was offered to a tertiary, only reviewers were non aware of previous screening decisions. A citation could non be offered to the same reviewer twice. Reviewers were not blinded to the authors of the presented record. In the commencement stage, two authors screened the championship and abstract of the retrieved records for eligibility based on predefined inclusion criteria (run into above). The title/abstract screening stage aimed to maximise sensitivity rather than specificity—whatsoever paper considered to be of any possible involvement was included.
Articles included subsequently the title-abstract screening were retrieved as full texts. Articles for which no total-text version could be obtained were excluded from the review. Full texts were then screened for definite inclusion and information extraction. At both screening stages, disagreements between reviewers were resolved by additional screening of the reference by a third adjudicating reviewer, who was unaware of the individual judgements of the first two reviewers. All information were stored on the SyRF platform.
Extraction, aggregation and diligence classification
From the publications identified, we extracted recommendations on the planning, conduct and reporting of preclinical brute studies as follows:
Elements of the included guidelines were identified using an extraction form (box ane) inspired by the results from Henderson et al.five Across guidelines, the elements were ranked based on the number of guidelines in which that element appeared. Extraction was not done in duplicate, but only in one case. As the extracted results in this case are non quantitative, only qualitative, meta-analysis and gamble of bias assessment are not advisable for this review. Still, we practical a diligence classification of the guidelines based on the following system, improving level of testify from one to 3 and back up from A to B:
Box ane
Extraction form
-
Matching or balancing treatment allocation of animals.
-
Matching or balancing sex of animals beyond groups.
-
Standardised handling of animals.
-
Randomised allocation of animals to treatment.
-
Randomisation for analysis.
-
Randomised distribution of animals in the animate being facilities.
-
Monitoring emergence of misreckoning characteristics in animals.
-
Specification of unit of analysis.
-
Addressing confounds associated with anaesthesia or analgesia.
-
Selection of appropriate control groups.
-
Curtained resource allotment of handling.
-
Study of dose–response relationships.
-
Utilize of multiple fourth dimension points measuring outcomes.
-
Consistency of outcome measurement.
-
Blinding of result assessment.
-
Establishment of primary and secondary end points.
-
Precision of consequence size.
-
Direction of conflicts of interest.
-
Choice of statistical methods for inferential assay.
-
Recording of the flow of animals through the experiment.
-
A priori statements of hypothesis.
-
Choice of sample size.
-
Addressing confounds associated with handling.
-
Characterisation of brute properties at baseline.
-
Optimisation of circuitous treatment parameters.
-
Faithful delivery of intended treatment.
-
Degree of characterisation and validity of upshot.
-
Treatment response forth mechanistic pathway.
-
Assessment of multiple manifestations of illness phenotype.
-
Cess of outcome at late/relevant time points.
-
Addressing treatment interactions with clinically relevant comorbidities.
-
Use of validated assay for molecular pathways cess.
-
Definition of consequence measurement criteria.
-
Comparability of control group characteristics to those of previous studies.
-
Reporting on convenance scheme.
-
Reporting on genetic groundwork.
-
Replication in different models of the aforementioned disease.
-
Replication in unlike species or strains.
-
Replication at unlike ages.
-
Replication at different levels of disease severity.
-
Replication using variations in handling.
-
Contained replication.
-
Addressing confounds associated with experimental setting.
-
Addressing confounds associated with setting.
-
Preregistration of written report protocol and analysis procedures.
-
Pharmacokinetics to back up treatment decisions.
-
Definition of treatment.
-
Interstudy standardisation of end signal option.
-
Define programmatic purpose of research.
-
Interstudy standardisation of experimental design.
-
Research within multicentre consortia.
-
Critical appraisal of literature or systematic review during pattern stage.
-
(Multiple) gratuitous text.
1. Recommendations of individuals or small groups of individuals based on individual feel only.
-
Published stand-alone.
-
Endorsed or initiated by at least one publisher or scientific club as stated in the publication.
ii. Recommendations past groups of individuals, through a method which included a Delphi process or other means of structured decision-making.
-
Published stand-solitary.
-
Endorsed or initiated by at least one publisher or scientific society as stated in the publication.
three. Recommendations based on a systematic review.
-
Published stand-alone.
-
Endorsed or initiated past at least one publisher or scientific society as stated in the publication.
Results
Search and study selection
A flow chart of the search results and screening process is found in figure 1. Our systematic search returned 13 863 results, with 3573 papers from PubMed, 5924 from Web of Science and 5982 from Embase. Afterwards first screening on title and abstract, 828 records were eligible for the full-text screening stage. After removing duplications (69), non-English language resources (48), conference abstracts (25), book chapters (14) and announcements (4), 676 records remained. Of these, 62 publications were retained after full-text screening. Nosotros later identified two further duplicate publications of the same guidelines in different journals, giving a final list of sixty publications.v 7–65
The project members did not identify any additional papers that had not been identified by the systematic search.
Diligence classification
More than half of the included publications (32) were narrative reviews that vicious under the 1A category of our rating system (recommendations of individuals or pocket-size groups of individuals based on individual feel only, published stand up-alone).vii nine 10 14 15 eighteen 20 25 27 29 xxx 33 35 36 39 41–43 45 47–55 57 60 61 65 An additional 22 publications were consensus papers or proceedings of consensus meetings for journals or scientific or governmental organisations (category 1B).3 4 eight 12 13 17 nineteen 24 26 28 32 34 37 38 44 46 56 59 62–64 66 None of these reported the use of a Delphi process or systematic review of existing guidelines. The remaining six publications were systematic reviews of the literature (category 3A).5 11 21 31 forty 58
Extracting components of published guidance
From the 60 publications finally included, nosotros extracted 58 unique recommendations on the planning, conduct and reporting of preclinical animal studies. The accented and relative frequency for each of the extracted recommendations is provided in table 1. Sample size calculations, acceptable statistical methods, concealed and randomised allocation of animals to treatment, blinded outcome cess and recording of animal menstruation through the experiment were recommended in more than half of the publications. Only a few publications (≤5) mentioned preregistration of experimental protocols, research conducted in large consortia, replication at dissimilar levels of disease or by variation in handling and optimisation of complex treatment parameters. The extraction class allowed the reviewers in free-text fields to identify and excerpt additional recommendations non covered in the prespecified list, but this facility was rarely used, with simply 'publication of negative results' and 'clear specification of exclusion criteria' extracted in this way by more than i reviewer. The total results table of this stage is published as csv file on figshare under the DOI 10.6084/m9.figshare.9815753.
View this table:
- View inline
Table one
Extraction results
Word
Based on our systematic literature search and screening using predefined inclusion and exclusion criteria, we identified threescore published guidelines for the planning, conduct or reporting of preclinical animate being enquiry. From these publications, nosotros extracted a comprehensive list of 58 experimental rigour recommendations that the authors had proposed as being of import to increase the internal validity of animal experiments. Most recommendations were repeated in a relevant proportion of the publications (sample size calculations, adequate statistical methods, concealed and randomised allocation of animals to treatment, blinded event assessment and recording of animal flow through the experiment in more half of the cases), showing that there is at least some consensus for those recommendations. In many cases this may be because authors are on more than than 1 of the expert committees for these guidelines, and many of them build on the same principles and cite the aforementioned sources of inspiration (ie, doing for the field what the Consolidated Standards of Reporting Trials did for clinical trials).66 67 There are also reasons why the consensus was not universal—many of the publications focus on single aspects (eg, statistics21 or sex differences60 or specific medical fields or diseases).xiii 37 38 63 In addition, the narrative review character of many of the publications may have led to authors focusing on elements they considered more important than others.
Indeed, more than half (32 out of 60) of the publications reviewed here were topical reviews by a small grouping of authors (usually fewer than five). Another 22 (37%) were proceedings of consensus meetings or consensus papers fix in motion by professional scientific or governmental organisations. It is noteworthy that none of these publications provide whatever rationale or justification for the validity of their recommendations. None used a Delphi process or other means of structured controlling as suggested for clinical guidelines68 to reduce bias,69 and none reported using a systematic review of existing guidelines to inform themselves almost literature. Of course, many of these expert groups volition have been informed by pre-existing reviews (the remaining vi included here were systematic literature reviews). However, there is a consistent feature beyond recommendations—that the steps recommended to increase validity are considered to exist self-evident, and a basis in experiments and evidence is seldom linked or provided. In that location are hints that applying these principles does contribute to internal validity, as it has been shown that the reporting of measures to reduce risks of bias is associated with smaller outcome effect sizes,lxx while other studies have not establish such.71 However, it is unclear if these measures taken are the perfect ones to reduce bias, or if they are but surrogate markers for more awareness and thus more thorough research conduct. We consider this to be problematic for at least two reasons: first, to increase compliance with guidelines it is crucial to continue them as elementary and equally like shooting fish in a barrel to implement as possible. An endless checklist tin can easily lead to fatalistic thinking in researchers desperately wanting to publish, and it could be debated whether guidelines are seen by some researchers every bit hindering their progression rather than being an aide to conducting the best possible scientific discipline, still, there is a divergence between an 'endless' list and a 'minimal set of rules' that guarantees proficient research reproducibility. 2nd, each procedure that is added to experimental set-up tin can in itself lead to sources of variation, so these should be minimised unless it can be shown that they add value to experiments.
Compliance is a pregnant problem for guidelines, as recently reported with the widely adopted Animal Research: Reporting of In Vivo Experiments (Go far) guidelines of the U.k.'southward National Centre for the 3Rs.66 72 This is not attributed to blind spots in the Make it guidelines. While enforcement past endorsing journals may be important,73 74 a recent randomised blinded controlled study suggests that even an insistence of completing an Go far checklist has trivial or no impact on reporting quality.75 We believe that grooming and availability of tools to better research quality will facilitate implementation of guidelines over time, every bit they get more prominent in researchers' mindset.
This systematic review has important limitations. The main limitation is that we used single extraction simply, which was due to feasibility, only creates a source of doubtfulness that nosotros cannot rule out. We decided so as we think the bias created here is significantly lower than in a quantitative extraction that includes meta-analysis. Protocol-wise, we only included publications in English linguistic communication, reflecting the express language pool of our team. Our broad search strategy identified more than xiii 000 results, but we did not identify reports or systematic reviews of primary enquiry showing the importance of specific recommendations,76 which must reverberate a weakness in our search strategy. Additionally, our programme to search the websites of professional person organisations and funding bodies failed due to reasons of practicality. Limiting the results included from a Google search would have been a practical solution to overcome this event, which we failed to determine at protocol generation. Although being aware of unmarried recommendations outside of publication, we did not include those to continue methods reproducible. In addition, we focused the search on 'guidelines', instead of a broader focus on calculation, for example, 'guidance', 'standard' or 'policy', as nosotros feared these terms would inflate the search results by magnitude (particularly 'standard' is a broadly used give-and-take). Hence, we cannot define whether we accept included all important sources of literature. Equally hinted in a higher place, the results presented hither likewise only pigment an overview of the literature consensus, which should past no means be mistaken for an accented basis truth of which steps need to be taken to improve internal validity in creature experiments. Indeed, literature debating the quality of these measures is sparse, and many of them take been borrowed from the clinical trials community or been considered self-evident from the literature. There is an urgent need for experimental testing of the importance of almost of these measures, to provide better prove of their result.
Acknowledgments
We give thanks Alice Tillema of Radboud University, Nijmegen, Kingdom of the netherlands, for her help in constructing and optimising the systematic search strings.
Review history and Supplementary material
-
Data Supplement
Peer review history and previous versions
- Data Supplement - Peer review history and previous versions
Request Permissions
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Heart'due south RightsLink service. You will be able to get a quick cost and instant permission to reuse the content in many unlike ways.
Copyright data:
© Author(s) (or their employer(due south)) 2020. Re-use permitted under CC BY. Published by BMJ. https://creativecommons.org/licenses/by/4.0/ This is an open access article distributed in accord with the Creative Commons Attribution iv.0 Unported (CC BY 4.0) license, which permits others to copy, redistribute, remix, transform and build upon this work for whatsoever purpose, provided the original work is properly cited, a link to the licence is given, and indication of whether changes were fabricated. Run into: https://creativecommons.org/licenses/by/4.0/.
Source: https://openscience.bmj.com/content/4/1/e100046
0 Response to "How to Review Internal Validity in Research Report"
Post a Comment