About this site

This app visualizes how ClinVar variants are distributed across CADD PHRED-score thresholds to help choose sensible score cut-offs for specific use cases.


Quick links
  1. To explore the distribution of ClinVar variants across CADD PHRED-score thresholds you can look here .

  2. If you are interested in the comparison of the different CADD versions and genome releases you can look here .

  3. If you want to investigate gene-specific distribution of variants across CADD PHRED-score thresholds you may look here .

  4. If you want to investiagte panel-specific distribution of variants across CADD PHRED-score thresholds you may look here .


What is CADD and how to use this application

CADD (Combined Annotation Dependent Depletion) is a tool that is used for scoring the deleteriousness of single nucleotide variants, multi nucleotide substitutions and insertions/deletions variants in the human genome.
When using CADD there are two scores. The raw and the PHRED-score. For the PHRED-score all potential single nucleotide variants (SNVs) in the genome (~9 billion) are sorted by their pathogenicity in comparison to all others. Each SNV then gets assigned a PHRED score depending on their rank. This means a variant that ranks in the top 10 percent of potentially pathogenic variants receives a PHRED score of 10 or higher. Variants in the top 1 percent receive a score of 20 or higher. PHRED scores are less resolved than Raw scores but are often used as they can be compared better with other scores.
It might seem useful to have a universal cut-off value that clearly seperates pathogenic from benign variants. However, the CADD authors advise against this, as the threshold depends on the specific analysis and use case. Applying a single universal cut-off would risk a considerable loss of valuable information.
Still, it is useful to see how variants are spread across different thresholds and to understand which factors affect what might be a good cut-off. The score distribution of known benign and pathogenic variants has been analysed and made usable on this website to help with finding a good cut-off for specific use cases.

Which dataset was used and how?
  • Source: ClinVar (accessed 2025-02-28). Original file: ~6.8M entries.
  • Kept only high-quality reviews (expert panel / practice guideline / multiple submitters, no conflicts). After filtering --> 1,135,635 entries.
  • Kept clinical classes: benign, likely benign, pathogenic, likely pathogenic --> 668,455 entries.
  • Split by reference genome: GRCh37 (334,246) and GRCh38 (334,209).
  • Scored remaining variants with CADD v1.6 and v1.7. CADD does not score large indels (>50 bp), variants with mismatched reference allele, or mitochondrial variants (4,085 unscored in GRCh37; 4,196 in GRCh38).
  • Duplicated annotations per variant were de-duplicated (one entry per variant used in the "Genes" summary table)

GRCh37: 252,785 benign / 77,377 pathogenic
GRCh38: 252,626 benign / 77,387 pathogenic


Used Metrics
Metric Meaning
True Negatives (TN) Negative values were correctly identified as negative
True Positives (TP) Positive values were correctly identified as positive
False Negatives (FN) Positive values were incorrectly identified as negative
False Positives (FP) Negative values were incorrectly identified as positive
Precision TP / (TP + FP): proportion of correctly positive predictions among all predicted positives
Recall (Sensitivity) TP / (TP + FN): proportion of correctly positive predictions among all actual positives
False Positive Rate (FPR) FP / (FP + TN): proportion of false positive predictions among all actual negatives
Specificity TN / (TN + FP): proportion of correct negative predictions among all actual negatives
F1 Score 2 * (Precision * Recall) / (Precision + Recall): harmonic mean of precision and recall
F2 Score Same as F1 Score but recall is weighted more heavily: 5 * (Precision * Recall) / (4 * Precision + Recall)
Accuracy (TP + TN) / (TP + FP + FN + TN): proportion of correct predictions
Balanced Accuracy (Recall + Specificity) / 2: useful for unbalanced classes



For more information on CADD and reference please refer to the CADD Website.
You may also look at these publications:

The most recent manuscript describes CADD v1.7, an extension to the annotations included in the model. Most prominently, this version improves the scoring of coding variants with features derived from the ESM-1v protein language model as well as the scoring of regulatory variants with features derived from a convolutional neural network trained on regions of open chromatin:

Schubach M, Maass T, Nazaretyan L, Röner S, Kircher M.
CADD v1.7: Using protein language models, regulatory CNNs and other nucleotide-level scores to improve genome-wide variant predictions.
Nucleic Acids Res. 2024 Jan 5. doi: 10.1093/nar/gkad989.
PubMed PMID: 38183205.


Then there is CADD-Splice (CADD v1.6), which specifically improved the prediction of splicing effects:

Rentzsch P, Schubach M, Shendure J, Kircher M.
CADD-Splice—improving genome-wide variant effect prediction using deep learning-derived splice scores.
Genome Med. 2021 Feb 22. doi: 10.1186/s13073-021-00835-9.
PubMed PMID: 33618777.


Our third manuscript describes the updates between the initial publication and CADD v1.4, introduces CADD for GRCh38 and explains how we envision the use of CADD. It was published by Nucleic Acids Research in 2018:

Rentzsch P, Witten D, Cooper GM, Shendure J, Kircher M.
CADD: predicting the deleteriousness of variants throughout the human genome.
Nucleic Acids Res. 2018 Oct 29. doi: 10.1093/nar/gky1016.
PubMed PMID: 30371827.


Finally, the original manuscript describing the method was published by Nature Genetics in 2014:

Kircher M, Witten DM, Jain P, O'Roak BJ, Cooper GM, Shendure J.
A general framework for estimating the relative pathogenicity of human genetic variants.
Nat Genet. 2014 Feb 2. doi: 10.1038/ng.2892.
PubMed PMID: 24487276.


Performance metrics across CADD PHRED scores

  1. Choose a CADD version and genome release (e.g 1.7 GRCh38)
  2. Choose the metrics you want to look at (For False Positives, True Positives, False Negatives and True Negatives the number of variants are displayed and for Recall,Specifity, False Positive Rate, Precision, F1 Score, F2 Score, Accuracy, Balanced Accuracy the percentage is displayed)
  3. You can hover over the graph to see specific data as well as change the range of the x-axis with the slider

Distributions

  1. You can also look at the distribution of the variants for the different thresholds for your chosen CADD version and genome release.
  2. It is possible to adjust the x-axis for the more small-scaled bar chart with the slider.
  3. If you want to look at the distribution of the consequences of all the pathogenic variants across thresholds, you may look at the last bar chart. (the likely pathogenic variants have a lower opacity)

Comparing CADD versions and genome release

  1. Choose a metric to compare
  2. Select the CADD version and genome releases to compare

Metrics Calculation for specific genes

  1. Upload a list of your genes (as csv, txt, tsv file) or write them in the text field.
  2. Choose your genome release and CADD version and then click on the “Generate metrics” button.
  3. Now all the metrics will load in one line graph. (If you want to see one metric, double click on the name on the legend. If you want to see more than one metrics, deselect all others b clicking once on the name on the legend.)
  • If you want to know which variants were used for calculating, together with their annotations, you can look at the table. You may choose if you want to look at the ClinVar or CADD annotations or both. For ClinVar only these annotations were kept: 'AlleleID', 'Type_x', 'Name', 'GeneID_x', 'GeneSymbol', 'Origin', 'OriginSimple', 'Chromosome', 'ReviewStatus', 'NumberSubmitters', 'VariationID', 'PositionVCF', 'ReferenceAlleleVCF', 'AlternateAlleleVCF', 'ClinicalSignificance'
  • To see how many variants were used per gene and if they are pathogenic or benign you can look at the bar chart (it might not be visible if you used a lot of genes, you could still zoom in). Below the bar chart is also a table that summarizes the information from the bar chart.

Note:

  • The gene names in the panels are matched against the gene names in the ClinVar and CADD databases. If a gene from the panel is not found in these databases, it will be skipped, and a message will be displayed indicating which genes were not found.
Export as csv

Metrics Calculation for gene panels (from PanelApp)

  1. Choose your genome release and CADD version.
  2. Select a gene panel from the dropdown menu.
  3. Click on the “Generate metrics” button.
  4. Now all the metrics will load in one line graph. (If you want to see one metric, double click on the name on the legend. If you want to see more than one metrics, deselect all others by clicking once on the name on the legend.)
  • If you want to know which variants were used for calculating, together with their annotations, you can look at the table. You may choose if you want to look at the ClinVar or CADD annotations or both. For ClinVar only these annotations were kept: 'AlleleID', 'Type_x', 'Name', 'GeneID_x', 'GeneSymbol', 'Origin', 'OriginSimple', 'Chromosome', 'ReviewStatus', 'NumberSubmitters', 'VariationID', 'PositionVCF', 'ReferenceAlleleVCF', 'AlternateAlleleVCF', 'ClinicalSignificance'
  • To see how many variants were used per gene and if they are pathogenic or benign you can look at the bar chart (it might not be visible if you used a lot of genes, you could still zoom in). Below the bar chart is also a table that summarizes the information from the bar chart.

Note:

  • The gene panels are retrieved from Panel App. There might be some delay between the latest PanelApp data and the data used in this tool.
  • The gene names in the panels are matched against the gene names in the ClinVar and CADD databases. If a gene from the panel is not found in these databases, it will be skipped, and a message will be displayed indicating which genes were not found.
Export as csv

Impressum / Imprint

The following information is required by German law. For your convenience, we are making a translation of the German text available at the bottom of the page. Please note that in case of a legal dispute, the German version takes precedence over the English version.


Projektleitung / Project leadership

Prof. Dr. Martin Kircher
E-Mail: martin.kircher [at] bih-charite.de
Tel: +49 30 450 543 004
Postanschrift / Postal Address

Charité – Universitätsmedizin Berlin

Campus Charité Mitte
Charitéplatz 1
D-10117 Berlin

Webmaster

Prof. Dr. Martin Kircher
Tel: +49 30 450 543 004


Disclaimer - Deutsch

Haftung für Inhalte

Die Inhalte unserer Seiten wurden mit größter Sorgfalt erstellt. Für die Richtigkeit, Vollständigkeit und Aktualität der Inhalte können wir jedoch keine Gewähr übernehmen.

Als Diensteanbieter sind wir gemäß § 7 Abs. 1 TMG für eigene Inhalte auf diesen Seiten nach den allgemeinen Gesetzen verantwortlich. Nach §§ 8 bis 10 TMG sind wir als Diensteanbieter jedoch nicht verpflichtet, übermittelte oder gespeicherte fremde Informationen zu überwachen oder nach Umständen zu forschen, die auf eine rechtswidrige Tätigkeit hinweisen. Verpflichtungen zur Entfernung oder Sperrung der Nutzung von Informationen nach den allgemeinen Gesetzen bleiben hiervon unberührt. Eine diesbezügliche Haftung ist jedoch erst ab dem Zeitpunkt der Kenntnis einer konkreten Rechtsverletzung möglich. Bei bekannt werden von entsprechenden Rechtsverletzungen werden wir diese Inhalte umgehend entfernen.

Datenschutzerklärung (DSGVO)

Diese Webseite sieht sich als Teil der Webpräsenz des Berlin Institute of Health (BIH) und der Charité - Universitätsmedizin Berlin. Es gelten die Datenschutzerklärung des BIH und die Datenschutzerklärung der Charité.

Diese Internetseite erfasst mit jedem Aufruf der Internetseite durch eine betroffene Person oder ein automatisiertes System eine Reihe von allgemeinen Daten und Informationen. Diese allgemeinen Daten und Informationen werden in den Logfiles des Servers gespeichert. Erfasst werden können:

  • die Unterwebseiten, welche über ein zugreifendes System auf unserer Internetseite angesteuert werden,
  • das Datum und die Uhrzeit eines Zugriffs auf die Internetseite,
  • eine Internet-Protokoll-Adresse (IP-Adresse),
  • der Internet-Service-Provider des zugreifenden Systems und sonstige ähnliche Daten und Informationen, die der Gefahrenabwehr im Falle von Angriffen auf unsere informationstechnologischen Systeme dienen,
  • sämtliche Dateien und Informationen, die bei der Benutzung der bereitgestellen Services anfallen.

Auf dieser Internetseite können bestimmte Dienste (z.B. Bewerten genomischer Varianten durch die Software CADD) unter Angabe von personenbezogenen Daten durchgeführt werden. Welche personenbezogenen Daten dabei übermittelt werden, ergibt sich aus der jeweiligen Eingabemaske. Allgemein werden bei der Benutzung der bereitgestellten Services, dem Bewerten genomischer Varianten durch die Software CADD, die folgenden Daten und Informationen erfasst:

  • sämtliche auf der Webseite durch Nutzende hochgeladen Dateien,
  • die zur Kontaktierung Nutzender in der Eingabemaske angegeben Informationen (Email-Adresse, weitere Informationen),
  • sämtliche mit diesen Daten und Informationen in Verbindung stehenden Informationen (Metadaten) wie Dateinamen, Datum und Uhrzeit,
  • sowie bereits im vorhergehenden Abschnitt genannte allgenmeine Daten und Informationen.

Es sei darauf hingewiesen, dass es ausdrückliche Aufgabe Nutzender dieser Webseite ist, dafür Sorge zu tragen, dass dabei keinerlei persönliche Daten Dritter verarbeitet werden.

Bei der Nutzung der genannten Daten und Informationen ziehen wir keine Rückschlüsse auf die betroffene Person. Diese Informationen werden vielmehr benötigt, um

  • die Inhalte unserer Internetseite korrekt auszuliefern,
  • die Inhalte unserer Internetseite zu optimieren,
  • die Nutzenden über die Verarbeitung ihrer Daten zu informieren,
  • die dauerhafte Funktionsfähigkeit unserer informationstechnologischen Systeme und der Technik unserer Internetseite zu gewährleisten sowie
  • um Strafverfolgungsbehörden im Falle eines Cyberangriffes die zur Strafverfolgung notwendigen Informationen bereitzustellen.

Diese anonym erhobenen Daten und Informationen werden daher von uns einerseits statistisch und ferner mit dem Ziel ausgewertet, den Datenschutz und die Datensicherheit in unserem Unternehmen zu erhöhen, um letztlich ein optimales Schutzniveau für die von uns verarbeiteten personenbezogenen Daten sicherzustellen. Die anonymen Daten der Server-Logfiles werden getrennt von allen durch eine betroffene Person angegebenen personenbezogenen Daten gespeichert.

Haftung für Links

Unser Angebot enthält Links zu externen Webseiten Dritter, auf deren Inhalte wir keinen Einfluss haben. Deshalb können wir für diese fremden Inhalte auch keine Gewähr übernehmen. Für die Inhalte der verlinkten Seiten ist stets der jeweilige Anbieter oder Betreiber der Seiten verantwortlich. Die verlinkten Seiten wurden zum Zeitpunkt der Verlinkung auf mögliche Rechtsverstöße überprüft. Rechtswidrige Inhalte waren zum Zeitpunkt der Verlinkung nicht erkennbar. Eine permanente inhaltliche Kontrolle der verlinkten Seiten ist jedoch ohne konkrete Anhaltspunkte einer Rechtsverletzung nicht zumutbar. Bei bekannt werden von Rechtsverletzungen werden wir derartige Links umgehend entfernen.

Urheberrecht Webseite

Die durch die Seitenbetreiber erstellten Inhalte und Werke auf diesen Seiten unterliegen dem deutschen Urheberrecht. Die Software CADD, sowie alle darüber bereit gestellten Dienste unterliegen dem amerikanischen Urheberrecht. Beiträge Dritter sind als solche gekennzeichnet. Die Vervielfältigung, Bearbeitung, Verbreitung und jede Art der Verwertung außerhalb der Grenzen des Urheberrechtes bedürfen der schriftlichen Zustimmung des jeweiligen Autors bzw. Erstellers. Downloads und Kopien dieser Seite sind nur für den privaten, nicht kommerziellen Gebrauch gestattet.

Die Betreiber der Seiten sind bemüht, stets die Urheberrechte anderer zu beachten bzw. auf selbst erstellte sowie lizenzfreie Werke zurückzugreifen.

Urheberrecht und Lizenzen zu CADD

Die Software CADD unterliegt dem amerikanischen Urherberrecht und den unten in englischer Sprache abgedruckten Nutzungs- und Haftungsbedingungen. Die Nutzung jeglicher mit der Software CADD verbundenen Daten und Dienste sind nur für den privaten oder nicht kommerziellen Gebrauch gestattet. Jegliche kommerzielle Nutzung bedarf der schriftlichen Zustimmung der Urheber. Lizenzen zur kommerziellen Nutzung sind über das UW CoMotion Express Licensing System erwerbbar. Sollten Zweifel bezüglich des kommerziellen Charakters einer Anwendung bestehen, bitte kontaktieren Sie Martin Kircher, Jay Shendure und Gregory M. Cooper, und beschreiben Sie die genaueren Umstände.

Disclaimer - English

Liability for Contents

The contents of our pages and social media channels have been created with great care. However, we cannot take any responsibility for the accuracy, completeness or timeliness of the contents.

As a service provider, we are responsible according to § 7 para 1 TMG (Tele Media Act) for own contents on these pages under the general laws. According to §§ 8 to 10 TMG, we are not required to monitor transmitted or stored information or to investigate circumstances that indicate illegal activity. The obligation to remove or block the use of information under the general laws remains unaffected by this. However, any liability is only possible from the date of knowledge of a specific infringement. Upon gaining knowledge of such violations, we will immediately remove this content.

Data Privacy Statement

This website is considered part of the online presence of Berlin Institute of Health (BIH) and Charité - Universitätsmedizin Berlin. Accordingly, the Data Privacy Statement of BIH (German only) and Data Privacy Statement of Charité apply.

This website records a number of general data and information each time a human user or automated system accesses the website. This general data and information is stored in the log files of the server. The following can be recorded:

  • the sub-websites, which are accessed on our website,
  • the date and time of each access to the website,
  • an Internet Protocol address (IP address),
  • the Internet service provider of the accessing system and other similar data and information that serve to avert danger in the event of attacks on our information technology systems,
  • all files and information that are generated by the use of the provided services.

On this website, some services (like evaluation of genomic variants using the CADD software) can be carried out by providing personal data. Which personal data are transmitted in this case, results from the respective input mask. In general, the following data and information are collected when using the provided services:

  • all files uploaded on the website by the user,
  • all information (email address, further information) specified in the input mask,
  • information associated with this previous data and information (metadata), such as file names, date and time,
  • all general data and information already mentioned in the previous section.

It shall be noted, that it is the special responsibility of each users of this website to ensure that no identifiable data of any third party are uploaded and processed via this website.

When using this data and information, the we do not draw any conclusions about the affected person. Rather, this information is needed to

  • to deliver the contents of our website correctly,
  • to optimize the contents of our website,
  • to inform users about the processing of their data,
  • to guarantee the permanent operability of our computer systems and the technology of our website and
  • to provide law enforcement agencies with the information necessary for prosecution in the event of a cyber attack.

These anonymously collected data and information are statistically evaluated by us in order to increase data protection and data security in our organization and to ultimately ensure an optimal level of protection for the personal data processed by us. The anonymous data of the server log files are stored separately from all personal data provided by the user.

Liability for Links

Our site contains links to external third-party websites over which we have no control. Thus we disclaim any warranty for these contents. The respective provider or operator of such sites is always responsible for the contents of the linked sites. The linked sites were checked at the time of linking for possible violations of law. Illegal contents were not apparent at the time of linking. A permanent control of the linked pages is unreasonable without concrete evidence of a violation.

Copyright Website

The created contents and works provided on these pages by the operators of this site are subject to the German copyright law. Third-party contributions are marked as such. Reproduction, adaptation, dissemination and any kind of exploitation outside the limits of the copyright require the written consent of the author or creator. Downloads and copies of these pages are only permitted only for private and non-commercial use.

The operators of these pages aim to observe the copyright of others or will refer to their own or license-free works.

Copyright CADD

CADD scores are freely available for all non-commercial applications. If you are planning on using them in a commercial application, you can obtain a license through the UW CoMotion Express Licensing System. If in doubt about whether you need a license for your application, please contact Martin Kircher, Jay Shendure and Gregory M. Cooper.


CADD License, incl. warranty and liability limitations

CADD is under Copyright from the University of Washington, Hudson-Alpha Institute for Biotechnology and the Berlin Institute of Health at Charité - Universitätsmedizin Berlin (2013-2023). All rights reserved.

Permission is hereby granted, to all non-commercial users and licensees of CADD (Combined Annotation Dependent Framework, licensed by the University of Washington) to obtain copies of this software and associated documentation files (the "Software"), to use the Software without restriction, including rights to use, copy, modify, merge, and distribute copies of the Software. The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

© University of Washington, Hudson-Alpha Institute for Biotechnology and Berlin Institute of Health at Charité - Universitätsmedizin Berlin 2013-2023. All rights reserved.