Before you begin Deduplication should be done before your team starts screening. Resolving duplicates after screening has begun can affect decision counts and complicate your PRISMA record. Make sure all your reference imports are complete before running duplicate detection.
Why deduplication matters
When you search multiple databases on the same topic, the same articles often appear in more than one database. These duplicates inflate your reference count and, if left unresolved, can cause the same article to be screened multiple times — leading to inconsistent decisions and an inaccurate PRISMA flow diagram.
The number of duplicates removed is a required data point in the PRISMA “Identification” stage, so keeping this process clean and traceable is important for the integrity of your review.
How duplicate detection works
Duplicate detection in Rayyan is a two-phase process:
- Detect — Rayyan scans your imported references and flags records that are likely duplicates, grouping them under Possible Duplicates with an Unresolved status
- Resolve — you review each flagged reference and decide which record to keep, either manually one pair at a time, or automatically using the Auto-Resolver
How to run duplicate detection
- Make sure all your reference imports are complete — add all your database exports before running detection
- Go to the Overview tab or the Review data tab
- Click Detect Duplicates
- Rayyan scans your dataset — a status message shows progress
- Once complete, the total number of possible duplicates found is displayed and the Possible Duplicates facet in the left panel is updated
How to resolve duplicates manually
Manual resolution lets you review each duplicate pair side by side. This is recommended when your dataset is small or when you want to verify each match carefully.
- In the Review data tab, find the Possible Duplicates section in the left panel
- Click Unresolved to filter and see only the duplicates that still need a decision
- Click on a record to open it, then click Resolve Duplicates
- Rayyan displays the duplicate records side by side with a confidence percentage
- Compare the records and choose one of the following actions:
| Action | What happens |
|---|---|
| Keep Left Article | The right article is moved to Deleted; the left is kept |
| Keep Right Article | The left article is moved to Deleted; the right is kept |
| Keep Both Articles | Both records are marked as Not Duplicate and kept active |
How to choose which record to keep
When two records represent the same article, consider:
- Completeness — prefer the record with a full abstract, DOI, volume, issue, and page numbers
- Source reliability — records from primary databases (e.g., PubMed, Scopus) tend to have more complete metadata than secondary sources
- Existing customization — if one record already has a screening decision, label, or note attached, keep that one to preserve your team’s work. Records with customizations are indicated by an exclamation mark next to the title
- PDF attachment — if one record already has a full-text PDF attached, keep that one
Auto-Resolver
For larger datasets, resolving duplicates one by one can be time-consuming. The Auto-Resolver lets you define matching criteria and automatically resolve duplicates that meet your rules. For full setup instructions and criteria details, see How to Use the Auto-Resolver in Rayyan.
Reference counts and PRISMA reporting
After running deduplication, your reference counts update as follows:
- Total imported references — full count of all records across all imports, before deduplication
- Records screened — imported references minus deleted duplicates
- Possible Duplicates → Deleted — the number you need for the PRISMA “Records removed before screening” field
Continue to: How to Use the Auto-Resolver in Rayyan
Comments
0 comments
Article is closed for comments.