Rayyan Prediction Classifier – Rayyan Help Center

Rayyan's built-in classifier uses machine learning to assist with screening by learning from your inclusion/exclusion decisions. Here’s a breakdown of how it works and the research behind it.

🔍 Classifier Overview

Rayyan uses a Support Vector Machine (SVM) classifier trained on key features extracted from each citation’s title and abstract, including:

🧩 Unigrams (single words)
🧩 Bigrams (pairs of words)
🧬 MeSH Terms (Medical Subject Headings)

These features are extracted after stopwords are removed and remaining terms are stemmed.

⚙️ How It Learns

As you (and your team) make inclusion/exclusion decisions, Rayyan's classifier starts to learn from your patterns. Once you’ve made at least:

✅ 50 screening decisions
- With minimum 5 "Include" and 5 "Exclude"

…the system trains the model to classify the remaining undecided articles.

It then calculates a confidence score for each unscreened article — based on how similar it is to your previous decisions — and translates it into a thumbs up (Include) or thumbs down (Exclude) rating.

🔁 Continuous Improvement

As you keep screening, Rayyan can re-evaluate its predictions model by running Compute ratings again. If it detects that new labeled examples could improve its predictions, it retrains the classifier and re-scores the remaining undecided citations. Compute ratings can be run after every 8 minutes after additional training data is received. This process continues until:

🗃️ All citations are labeled
🧠 The model reaches optimal performance and can't be further improved

📊 Validated Performance

Rayyan’s classifier was tested using the same features in a previous study:
➡️ Read the full JAMIA study

Study Highlights:

📚 15 systematic review datasets
🧪 2-fold cross-validation (repeated 10 times, with 50% training / 50% testing)
🧮 Metrics used:
- AUC (Area Under the Curve): 0.87 ± 0.09
- WSS@95 (Work Saved over Random Sampling): 0.49 ± 0.18

WSS@95 tells you how many citations reviewers didn’t have to screen thanks to the classifier, while still maintaining 95% recall.

📘 Want to Dive Deeper?

Read the paper: You can check out more technical details in our original publication: Rayyan - a web and mobile application for systematic reviews

📌 Note: We're currently working on a new publication focused on Rayyan and its unique approach.

Latest Research: Rayyan has introduced a new, premium predictions classifier that combines different approaches, requiring just 4 inclusion and 1 exclusion decision to start generating predictions. For more information on how to access this advanced capability, please contact Sales.

💬 Still Need Help?

We’re here for you. Submit a support ticket and we’ll assist you personally.

And don’t forget to follow us on Twitter @rayyanapp for updates, tips, and tricks!

Related Research

Although we published an additional research paper on a different classifier using a Random Forest: Machine Learning for Systematic Reviews - Springer Article, we opted not to use that model in Rayyan because some of the features in that study are impractical to incorporate into a real-time production system.