Got Machine Learning?
Machine learning has been an industry buzz word for the last few years. More and more frequently it’s being sought after by potential customers.
Unfortunately, machine learning is rarely sought as a solution to a particular problem, rather it is simply treated as an item on a checklist.
This is analogous to asking a contractor if they use a hammer. When the contractor uses the hammer and on what is far more important. Machine learning, or any other buzzword, will never be the solution to every problem. Some problems will require a machine learning solution, some problems will require a hammer.
An approach, not a solution
Unlike the hammer, machine learning has far more applications and far more variations. Each variation has its own strengths and weaknesses. A successful implementation will find an algorithm that fits the problem, which means you must first have a good understanding of the problem. Any conversation that starts with “how can we use machine learning?” is moving in the wrong direction.
Applying the wrong machine learning algorithms in the wrong context will consume large amounts of resources for at best little to no information, or worse, mis-information. Don’t force an ill-fitting machine learning solution just to check off a box on a checklist.
For most solutions, typical machine learning implementations follow the pattern:
- Normalize, or preprocess, the dataset,
- Construct multiple machine learning models,
- Evaluate the models for best fit,
- Apply the best model to production systems.
This works well in cases where the dataset is fairly consistent, like facial recognition. Perhaps the facial recognition system in a museum of modern art has to work a little harder in the Picasso wing, but for the most part, two eyes with one nose and one mouth more or less in the middle, makes a face. This is, of course, a flagrant over simplification of facial recognition, but the point is that once that model is built it generally will not become obsolete until faces drastically change.
Internet traffic, which is the subject that DOSarrest is most concerned with, is anything but consistent. For the HTTP protocol alone in the last few years we have gone from HTTP1.1 to SPDY to HTTP2, all of which are distinct enough from each other to make a previously generated model obsolete. Furthermore, we see over one thousand new vulnerabilities (and consequently attacks) each year, which our models must be able to detect. This trend in a dataset to change overtime, farther away from what the model was trained on, is called Concept Drift. And when we go looking for needles in the world wide haystack, concept drift was not something we wanted to have to contend with.
Solution 1: Incremental Learning
Incremental Learning is a newer approach in machine learning designed to avoid concept drift. Incremental learning is constantly rebuilding the model based on new data and previous models creating a constantly evolving model. While there are numerous algorithms that operate this way, most standard machine learning toolkits implement very few of them. Furthermore, they require dedicated processing and large amounts of data in memory for constant retraining.
Solution 2: Rendezvous Pattern
Because of the difficulty in finding an implementation of incremental learning that fits our needs and integrates well with our existing systems we instead turn to an alternate pattern we had already implemented in other systems. While not specific to machine learning, the Rendezvous Pattern can easily be adapted to machine learning tasks and allows us to evaluate and/or train multiple models concurrently using the same dataset real-time and select the best fitting model for the moment. Inaccurate or obsolete models get replaced with newer models
A full copy of incoming data is sent to each potential model, any models still in training use the new data in addition to the recent historical data to train a new model, if the model performs well it may be considered as an alternate model.
After each model is applied to the incoming data their respective results are returned to an evaluator which select the best response to be submitted back to the requestor. As the models start losing accuracy to concept drift they are eventually disregarded by the evaluator, and finally replaced by a better fitting model.
In both of the above solutions it is tempting to always keep the best fitting model, and discard or disregard models that are currently poor performers.
Because we are dealing with inconsistent data, we don’t see every case in every iteration. Simply put, models that are created and evaluated without attack data would be less accurate classifying an attack, than a model created with attack data, even though the attack model would be less accurate in most other situations. Because we want to retain the ability to identify a wide array of uncommon events it is more important to consider model diversity over accuracy alone when deciding which models are obsolete.
Critical success or Critical failure
As mentioned machine learning comes in many flavors and the above implementations happen to work well for our particular problems, they may or may not be suited to your problems. If I can leave you with one thing to take away it would be this: If your instinct is just to simply throw machine learning at a problem and hope for a solution, you’re probably going to be better off throwing a hammer instead. In the end it will likely cause less grief.
Senior Application Security Architect, DOSarrest Internet Security