Rayyan's built-in classifier uses machine learning to assist with screening by learning from your inclusion/exclusion decisions. Here’s a breakdown of how it works and the research behind it.
🔍 Classifier Overview
Rayyan uses a Support Vector Machine (SVM) classifier trained on key features extracted from each citation’s title and abstract, including:
-
🧩 Unigrams (single words)
-
🧩 Bigrams (pairs of words)
-
🧬 MeSH Terms (Medical Subject Headings)
These features are extracted after stopwords are removed and remaining terms are stemmed.
⚙️ How It Learns
As you (and your team) make inclusion/exclusion decisions, Rayyan's classifier starts to learn from your patterns. Once you’ve made at least:
-
✅ 50 screening decisions
-
With minimum 5 "Include" and 5 "Exclude"
-
…the system trains the model to classify the remaining undecided articles.
It then calculates a confidence score for each unscreened article — based on how similar it is to your previous decisions — and translates it into a thumbs up (Include) or thumbs down (Exclude) rating.
🔁 Continuous Improvement
As you keep screening, Rayyan can re-evaluate its predictions model by running Compute ratings again. If it detects that new labeled examples could improve its predictions, it retrains the classifier and re-scores the remaining undecided citations. Compute ratings can be run after every 8 minutes after additional training data is received. This process continues until:
- 🗃️ All citations are labeled
- 🧠 The model reaches optimal performance and can't be further improved
📊 Validated Performance
Rayyan’s classifier was tested using the same features in a previous study:
➡️ Read the full JAMIA study
Study Highlights:
-
📚 15 systematic review datasets
-
🧪 2-fold cross-validation (repeated 10 times, with 50% training / 50% testing)
-
🧮 Metrics used:
-
AUC (Area Under the Curve): 0.87 ± 0.09
-
WSS@95 (Work Saved over Random Sampling): 0.49 ± 0.18
-
WSS@95 tells you how many citations reviewers didn’t have to screen thanks to the classifier, while still maintaining 95% recall.
📘 Want to Dive Deeper?
Read the paper: You can check out more technical details in our original publication: Rayyan - a web and mobile application for systematic reviews
📌 Note: We're currently working on a new publication focused on Rayyan and its unique approach.
Latest Research: Rayyan has introduced a new, premium predictions classifier that combines different approaches, requiring just 4 inclusion and 1 exclusion decision to start generating predictions. For more information on how to access this advanced capability, please contact Sales.
💬 Still Need Help?
We’re here for you. Submit a support ticket and we’ll assist you personally.
And don’t forget to follow us on Twitter @rayyanapp for updates, tips, and tricks!
Related Research
Although we published an additional research paper on a different classifier using a Random Forest: Machine Learning for Systematic Reviews - Springer Article, we opted not to use that model in Rayyan because some of the features in that study are impractical to incorporate into a real-time production system.
Comments
0 comments
Article is closed for comments.