How to Detect and Manage Duplicates in Rayyan – Rayyan Help Center

Before you begin Deduplication should be done before your team starts screening. Resolving duplicates after screening has begun can affect decision counts and complicate your PRISMA record. Make sure all your reference imports are complete before running duplicate detection.

Why deduplication matters

When you search multiple databases on the same topic, the same articles often appear in more than one database. These duplicates inflate your reference count and, if left unresolved, can cause the same article to be screened multiple times — leading to inconsistent decisions and an inaccurate PRISMA flow diagram.

The number of duplicates removed is a required data point in the PRISMA “Identification” stage, so keeping this process clean and traceable is important for the integrity of your review.

How duplicate detection works

Duplicate detection in Rayyan is a two-phase process:

Detect — Rayyan scans your imported references and flags records that are likely duplicates, grouping them under Possible Duplicates with an Unresolved status
Resolve — you review each flagged reference and decide which record to keep, either manually one pair at a time, or automatically using the Auto-Resolver

Note Duplicate detection can only be run once per unique dataset. If you want to run it again, you must first add or delete references from your review. If you see the message “A search must be added or deleted before running duplicate detection again,” detection has already been completed on your current dataset.

How to run duplicate detection

Make sure all your reference imports are complete — add all your database exports before running detection
Go to the Overview tab or the Review data tab
Click Detect Duplicates
Rayyan scans your dataset — a status message shows progress
Once complete, the total number of possible duplicates found is displayed and the Possible Duplicates facet in the left panel is updated

Best practice Run duplicate detection only after you have finished importing references from all your databases. Running it mid-import means you will need to add or delete references to trigger a re-run — adding unnecessary steps to your workflow.

How to resolve duplicates manually

Manual resolution lets you review each duplicate pair side by side. This is recommended when your dataset is small or when you want to verify each match carefully.

In the Review data tab, find the Possible Duplicates section in the left panel
Click Unresolved to filter and see only the duplicates that still need a decision
Click on a record to open it, then click Resolve Duplicates
Rayyan displays the duplicate records side by side with a confidence percentage
Compare the records and choose one of the following actions:

Action	What happens
Keep Left Article	The right article is moved to Deleted; the left is kept
Keep Right Article	The left article is moved to Deleted; the right is kept
Keep Both Articles	Both records are marked as Not Duplicate and kept active

Note When you select a record to keep, the other record moves to Deleted status — it is not immediately and permanently erased. You can still see deleted duplicates by filtering for “Deleted” in the Possible Duplicates facet. Unresolved duplicates remain in your active reference set — resolve them all before starting screening.

How to choose which record to keep

When two records represent the same article, consider:

Completeness — prefer the record with a full abstract, DOI, volume, issue, and page numbers
Source reliability — records from primary databases (e.g., PubMed, Scopus) tend to have more complete metadata than secondary sources
Existing customization — if one record already has a screening decision, label, or note attached, keep that one to preserve your team’s work. Records with customizations are indicated by an exclamation mark next to the title
PDF attachment — if one record already has a full-text PDF attached, keep that one

Best practice Pay close attention to the count of articles grouped as duplicates. Some groups may contain more than two records — make sure you review all records in the group, not just the first pair shown.

Auto-Resolver

For larger datasets, resolving duplicates one by one can be time-consuming. The Auto-Resolver lets you define matching criteria and automatically resolve duplicates that meet your rules. For full setup instructions and criteria details, see How to Use the Auto-Resolver in Rayyan.

Important Auto-resolution is irreversible. Always verify your settings before confirming an Auto-Resolver run.

Reference counts and PRISMA reporting

After running deduplication, your reference counts update as follows:

Total imported references — full count of all records across all imports, before deduplication
Records screened — imported references minus deleted duplicates
Possible Duplicates → Deleted — the number you need for the PRISMA “Records removed before screening” field

Continue to: How to Use the Auto-Resolver in Rayyan