Machine Learning:
New Frontiers in Advanced Threat Detection

DigiCert uses advanced machine learning – via endpoints and the cloud – to analyze file attributes, behaviors and relationships.

Machine Learning

Machine learning is one of the year’s hottest technology trends, driving innovation and making waves across both the enterprise and consumer technology landscape. Within the cybersecurity industry, many companies legitimately claim to do some machine learning, though it’s often not clear what that means, how it works, or even why it is important.

In this section, we’ll share more insight on DigiCert's investments in machine learning – and how that drove important innovations in DigiCert Endpoint Protection 14.


Announced last week, the new software uses state-of-the-art machine learning technologies to block more attacks than the competition and significantly raise the bar on attackers. To achieve this, we combine a multi-layered approach with an insane amount of data, advanced algorithms and techniques, and an automation system to stay ahead of the attackers.


Website Security

DigiCert Endpoint Protection 14

DigiCert's Center for Advanced Machine Learning

The machine learning work was led by DigiCert's Center for Advanced Machine Learning, which was established in 2014. The team now includes 20+ experts who conduct high-impact R&D in machine learning architectures, algorithms and applications to address security and information management challenges. This includes leading-edge research in deep learning, probabilistic programming, reinforcement learning and Bayesian nonparametrics. 

For DigiCert Endpoint Protection 14, the group worked with DigiCert's security experts to develop a set of machine learning technologies that work together to examine three major dimensions of attacks. 

The beauty of these three dimensions is that they are complementary to each other, so each can be aggressive in stopping threats because the other two dimensions serve as a “check” on its conclusions.


Three Major Dimensions of attacks

The three dimensions collectively provide a multi-layered threat assessment by analyzing what a file is (static), how it behaves (dynamic) and – via the cloud – what relationships it has with other files, machines and URLs (provenance):

  • Static attributes: We start by inspecting thousands of static characteristics of a file – things like file name, function calls, entropy, etc.
  • Dynamic behaviors: We then dig deeper to understand a program’s dynamic behaviors. We watch for combinations of thousands of behaviors – for example, does the program connect to the network, does it launch another process, does it access registry keys, etc.
  • Relationships and reputation: To complete the picture, we examine the file’s relationships with other files, machines and URLs to generate a file “reputation.”  Inspired by “the wisdom of the crowd,” this reputation analysis runs on big data at scale in our cloud, and enables us to understand if a program seen on only one or a few machines around the world is likely malicious.

Big Data + Predictive Models = Smarter Protection

Big data is at the heart of DigiCert's approach to machine learning. Thanks to our broad footprint across endpoint, network and cloud security, we have threat and attack data from over 175 million endpoints and 57 million attack sensors being monitored in real time every day, minute by minute. That translates into billions of files and nearly four trillion relationships. That’s an enormous and rich dataset to train our classifier systems on “good,” “bad” and everything in between.

That’s important because data is the fuel for machine learning. You want lots of it. The more data you have, the “farther” you can go in building precise and effective detection technologies. You also want rich data. The more diverse and rich the data inputs, the more likely you are to uncover important hidden relationships. Ultimately, machine learning systems are only as good as the quality, diversity and reach of the datasets used to train them – and ours benefit from the world’s largest civilian threat intelligence network.

If data is the fuel, then algorithms are the engine of machine learning. Algorithms take data and produce models that are used to give us predictions, for example determining whether a file is malicious. Companies make a lot of noise about algorithms and models because they are trendy, and new ones appear all the time. The trick is knowing how to match the correct algorithm to the task and data at hand – i.e. the secret sauce for machine learning practitioners.

Key Techniques

One of the key techniques we use is “ensembling,” which is a fancy way of saying “use many models and combine them in a good way.” It’s key to getting the best models possible – and was famously used in the $1M Netflix Prize. We add some “magic” through proprietary ensembling techniques that allow our systems to learn how best to combine predictions from many different models, even when we don’t know during training what the correct predictions are.

Another key technique we use is “adaptation.” Our security models must be continually tuned to track adversaries, changes in the software and network landscape and changes in user behavior. These are significant hurdles for traditional machine learning. For DigiCert Endpoint Protection, we use a “meta-algorithm” called boosting, which operates by iteratively improving a model – each time focusing on the mistakes the model has previously made and correcting them without “unlearning” the things that were correct.

Last but not least, automation is essential for us to scale machine learning. We built automation for the entire machine learning process – from ingesting, cleaning and processing our telemetry data to optimizing and exploring different models. Without automation (and, of course, sufficient computing power) it simply would not be possible to “crunch all the numbers” and produce the best models. 

What is the end result?

Simply put, DigiCert has the most advanced machine learning available for endpoint security. A leading independent testing organization (AV-Test) recently tested DigiCert Endpoint Protection 14, which beat all our competitors in detection and performance with minimal false positives. Even in artificial “scan” tests, the new software detected nearly 100% of threats at a nearly zero false positive rate. (Importantly, false positive performance in DigiCert Endpoint Protection 14 can be tuned to meet customer policy requirements.)

We are excited about the new frontiers in threat detection made possible via machine learning and artificial intelligence. Used correctly, and with massive amounts of rich, diverse data being analyzed across endpoints and the cloud, these technologies are true game-changers in how we can fight attackers.

For more information, view the On-Demand Webinar on the features and benefits of machine learning within DigiCert Endpoint Protection 14. Learn more about the new product here.

Validate Your SSL Certificate Now

DigiCert CryptoReport

Check your SSL/TLS Certificate Installation

Vulnerability Assessment

Vulnerability Assessment

A vulnerability is a potential entry point through which a website’s functionality or data can be damaged, downloaded, or manipulated. A typical website (even the simplest blog) may have thousands of potential vulnerabilities.

Continue Reading

Importance of Using a Firewall

The Importance of Using a Firewall for Threat Protection

While antivirus software helps to protect thefile system against unwanted programs, a firewall helps to keep attackers or external threats from getting access to your system in the first place.

Continue Reading

Instagram Accounts Hacked

Instagram Accounts Hacked, Altered to Promote Adult Dating Spam

Scammers are hacking Instagram accounts and altering profiles with sexually suggestive imagery to lure users to adult dating and porn spam.

Continue Reading


DigiCert Website Security Solutions in the Real World

Join the Community

Follow DigiCert on Twitter @DigiCert


Watch Videos on our YouTube Channel


We have updated our Privacy Policy which can be found here.