Chapter 30 Doing a Systematic Review

[[ Note: this chapter is very much in-progress, and so it contains some parts that are readyish and some parts that are very drafty ]]

A systematic review is the only reliable way to obtain an overview of what is known about a given topic. Systematic reviews differ from conceptual or narrative reviews in that they can afford the right to make strong statements by virtue of the employed procedures: those can and should be extremely rigorous, systematic, transparent, and reproducible.

This means that as much as possible, the decisions that are taken must be clearly documented and justified, so that the inevitable biases that the research team bring to the project can be taken into account when interpreting the results. It also means that the preparation phase is vital.

30.1 Planning a systematic review

An extensive and inclusive registration form for systematic reviews exists (, and that is a good place to start planning your systematic review. Planning a systematic review works in the opposite order of conducting it. Specifically, when planning, you and your research team have to achieve consensus on the following matters, in this order:

  1. The goals and/or research question(s);
  2. As a function of this, the entities you will extract (see below);
  3. As a function of these entities and the goal/research questions, the exclusion criteria you will use during screening;
  4. As a function of the goal/research questions and the exclusion criteria, the conceptual form of the query you will use;
  5. Which database(s) and interface(s) you will use.

This will determine the scope of your review. Although these steps depend on the previous steps’ output, in practice, this process is often nonlinear and iterative. For example, you often test draft queries in your interface/database combinations to see how many hits you obtain, potentially deciding to adjust your exclusion criteria or even your goals or research questions depending on what you find.

30.2 Query Crafting and Databases Versus Interfaces

Running your query is the first operational step of your systematic review: it’s often one of the first things you do after you publicly froze your preregistration. In that sense it’s kind of exciting, but ironically normally the results you obtain will not be surprising, since you repeatedly test your query while crafting it.

30.2.1 Queries as logical expressions

A query is a logical expression that specifies the conditions that must be met for bibliographic records to be returned by the interfaces that you use to search the bibliographic databases. You first craft this query in a conceptual form, not worrying about the syntax that you will have to use to specify your query in a way the different interfaces can parse.

The simplest queries typically bind together sets of synonyms using the logical conjunction operator (often represented by AND, &, or &&). Each set of synonyms binds together various terms using the logical disjunction operator (often represented by OR, |, or ||). For example, imagine we’re doing a systematic review on the determinants of ecstasy use. In that case, a simple query could be:

("determinant" OR "determinants" OR "correlate" OR "correlates") AND ("ecstasy" or "XTC" or "MDMA")

This query has two terms. We could label the first “determinants,” since it is intended to capture all synonyms for “determinants” (it does so badly, since I wanted to keep this example short; such lists of synonyms are normally much longer), and we could label the second “ecstasy,” since its task is to match all records that contain a word for “ecstasy” (again, doing so badly to enable a brief example).

Using these two logical operators, it’s also possible to build more complex queries. For example, if we would know enough about substance use to realise that the determinants of trying out (i.e. “initation” of ecstasy use) are different from what you’d find if you do a determinant study into the determinants of “using ecstasy,” the second term would become more complex:

("determinant" OR "determinants" OR "correlate" OR "correlates") AND (("ecstasy" or "XTC" or "MDMA") AND (("using" OR "use") OR ("trying out" OR "initiating")))

The complexity of the query you end up with is often related to the complexity or “subtlety” of your goal or research question. If you’re conducting a scoping review, where you aim to map out the literature on a specific topic, you will generally have simpler queries then when you have a specific narrow research question.

Query complexity is often also related to the richness of the literature. If the literature on a topic is very extensive, the review may become unfeasible if you use a very simple, broad query: you might obtain tens of thousands of hits without the resources to screen all of those. Similarly, if you’re surveying a smaller literature, you can afford to have a less sophisticated query. Since screening costs a lot of time, it usually pays off to spend a lot of time developing your query so that you minimize the number of irrelevant hits.

30.2.2 Database fields

In addition to the terms themselves, you can specify the fields you want to search. For example, you can search all text fields (the default in most interfaces if you don’t specify one or more fields), or only the title field, or the title and the abstract and the keyword fields, et cetera. Usually you will want to search the titles at the bare minimum; and unless you are confident of relatively standardized vocabulary in a field, you’ll often want to add the abstract and keyword fields. Including fields like journal name, authors, or affiliations usually doesn’t make sense, so omitting explicitly specified field names is very rare.

Sometimes interfaces will allow you to specify multiple fields in a query, for example, indicating that a search term (e.g. a set of synonyms) can occur in either the title or the abstract; but often that’s not possible, and you have to duplicate parts of your query. This can cause queries to grow exponentially, and this is one of the reasons why it is important to craft your query on the conceptual level before starting the translation into the interface languages.

30.2.3 Wildcard characters

The query languages used by each interface have many advanced features that you can use to build powerful queries, and it is worthwhile diving into those. In addition to logical and other operators, another category of such features is wildcard characters. For example, the asterisk (*) can often be used to signify “zero or more alphabetic characters,” and the question mark (?) can often be used to signify “zero or one alphabetic characters.” This allows you to rewrite this query fragment:

"behavior" OR "behaviors" OR "behavioral" OR "behaviour" OR "behaviours" OR "behavioural"

into this much shorter fragment:


Because such operators differ per interface, it usually pays off to obtain a thorough understanding of the capabilities of each interface you will use before starting to craft your query (or while doing so), since you will want to create a query that’s as powerful and versatile as you can, but you will have to do this within the constraints of the query languages of the interfaces you’ll use.

30.2.4 Team Consensus and Expert Consultation

It is important to achieve consensus with the team about the query before you finalize your preregistration and then run your query “for real.” If you miss important keywords, that can be a very expensive oversight to correct later on (depending on how smartly you designed your screening procedure; see below). For this reason, it is common to involve experts outside the research team to consult on the lists of synonyms and the logical structure of the query.

30.2.5 Databases versus Interfaces

Once you formulated your conceptual query, you can start translating it into the languages that the interfaces of the database you will use can understand. This language is generally specific to each interface. An interface is the application that performs the searches in the bibliographic databases for you and allows you to export the results in whichever format you want to use.

For example, PsycINFO is a bibliographic database maintained by the American Psychological Association. This database is accessible through various interfaces, and different institutions will have licenses with different interface providers, for example, Ebsco or Ovid. Ebsco and Ovid use different interface languages, so the syntax and operators you have to use to formulate your query will be different. Sometimes, a database maintainer offers its own interface: PubMed is a good example of this.

As a consequence of this heterogeneity in interface languages, once you crafted your conceptual query, you have to translate it into each interface’s language. Depending on how many fields you want to search and on the features of each language, this can explode your query into quite unwieldy strings of characters. Make sure to document both the conceptual query and the final query you input into each database/interface combination.

30.2.6 “Smart” searching

When conducting a systematic review, make sure to disable all “smart” searching features of the interface(s) you use. These features expand your query by including other synonyms. However, of course, this “smart” searching algorithm is in fact dumb: it cannot understand your goal(s) and/or research question(s), and so it will simply explode your query to find many more hits, the vast majority of which will by irrelevant to your goals/questions, because after all, you crafted a well-thought-through query.

A second problem of “smart” searching is that it is not replicable, since the algorithms implemented by these interface maintainers are adjusted over time. Since you cannot encode a “smart search version” parameter in your query specification, it’s not possible to solve this. As a consequence, using “smart” searching in effect renders your systematic review unsystematic: it can no longer be reproduced by other researchers — and worse, by yourself in the future.

Since systematic reviews typically take a year and often longer (see, you will often have to repeat your query towards the end, screening the additional hits, extracting entities from the additional inclusions, and re-running your analysis script. If your query was applied using “smart” searching, the results in this repeated query exectution can change unpredictably.

Therefore, never use “smart” searching in the final query you will use (and freeze in your preregistration). You can use it while crafting your query, to find additional sources to include, inspect the titles and abtracts for search terms you may have missed (people use the weirdest synonyms at times), and improve your conceptual query accordingly.

30.2.7 Query validation

Usually, you’ll already have a few sources (e.g. articles, book chapters, etc) that you know you will be including in your systematic review. While testing and perfecting your query, you’ll usually use these to check whether your query “works”: whether it finds the articles you know it’s supposed to find. If it fails to find one or more, then check whether it’s supposed to find it: all bibliographic databases have a limited scope, and so the source might simply not exist in that database (easily checked by entering its title as a query). If it was supposed to be included in the hits but wasn’t, then your query is missing one or more synonyms, so add those.

A quick way to check whether a given source is included in your query is by combining it with your query: basically create a “single use query” that combines the source’s title (or DOI, or ISBN, or any other unique identifier for the source) with your query using the AND operator.

30.2.8 Exporting query hits

Once you ran your queries, you will need to download the hits: i.e. the identified bibliographic records. There is usually a set of formats available: a very common format that is generally well-supported is RIS (a format developed by “Research Information Systems”), and another good choice is BibTeX. Before deciding on the format, make sure you know how you want to conduct your screening. Ideally, you will be able to easily repeat your query later, either when you revise the manuscript to make sure your findings aren’t outdated; if you updated your query because you discovered you made a mistake; or in the case of living reviews, when you want to update the review.

30.3 Screening efficiently and avoiding double work

[[ Still have to develop this into a section ]]

  1. download all hits of the queries in the various databases (and through the various interfaces) as .bibtex (or .ris) files to your PC;

  2. import these and deduplicate these (you can do this with a reference manager, or with the metabefor functions, which of course is what I always do because it’s more transparent and efficient, and easy to re-run if you slightly change a query)

  3. write the merged file to disk and send it to all screeners;

  4. make sure screeners can only see title, keywords, and abstracts, and are blinded from authors, journal, and year etc;

  5. let screeners indicate for each entry why it is excluded (based on a progressive list of exclusion criteria, that is based on your extraction scripts), or, for entries they cannot exclude, indicate inclusion;

  6. if you have a lot of hits (thousands), usually you first screen based on title only, and only in the second round, on abstract for those entries that could not be excluded based on title;

  7. after screening based on abstracts, acquire full-texts and screen those again

  8. then you have your list of included sources

  9. something is only excluded if all two/three/… screeners exclude it (the reason can be different; but if one screener fails to exclude, it’s retained for the next step)