By Ben Edwards
The first EPSS model only scored recent vulnerabilities – those which had CVSS 3.1 metrics scored, and so one of the goals of the second model was to score vulnerabilities for all 170,000+ CVEs. But since these older vulnerabilities were only scored using CVSS 2.0, this presented a problem. What we needed was a way to score (or at least estimate the metric values for) 100,000 older vulnerabilities. This article explains how we did just that.
In order to successfully provide scores for older vulnerabilities it was necessary to have complete data on all of those old vulnerabilities. A key piece of that data is the CVSS v3 metrics. The vulnerability assessments provided by CVSS v3 are only available mostly for vulnerabilities created after 2015.
Because of the large increase in the number of vulnerabilities in the last 5 years however, there are ample data that allows us to infer the CVSSv3 metrics for older vulnerabilities. To accomplish this, much of the data used as inputs to EPSS was used to train an Artificial Neural Network (ANN) to predict the CVSS v3 base vector.
The model was developed using time stratified 8-fold cross validation on the data set of vulnerabilities which had both CVSS v3 metrics. We were able to achieve an accuracy of 75% for predicting the exact CVSSv3 vector. Each individual sub component was predicted with greater than 93% accuracy.
This equates to achieving 88% accuracy when considering the vulnerability “severity” (None/Low/Medium/High/Critical). Moreover for 99.9% of the predictions the ANN was able to predict at least 4 out of the 8 metric values correctly. The model also performed well across all time periods.
Future work may focus on improving these predictions. Predicted CVSSv3 values are included as EPSS inputs only when the CVSSv3 scores are unavailable. An additional variable indicating whether the CVSSv3 score was a prediction or from the original vulnerability was included as well.
This ability to produce CVSS 3.1 scores for 100,000 CVEs (i.e. those prior to 2016) made a significant contribution to our ability to produce EPSS scores.