The Evolution of Applied Machine Learning in Financial Services

September 21, 2022by Rebecca Graham

Subscribe to the Smarsh Blog Digest

Subscribe to receive a monthly digest of articles exploring regulatory updates, news, trends and best practices in electronic communications capture and archiving.

Smarsh handles information you submit to Smarsh in accordance with its Privacy Policy. By clicking "submit", you consent to Smarsh processing your information and storing it in accordance with the Privacy Policy and agree to receive communications from Smarsh and its third-party partners regarding products and services that may be of interest to you. You may withdraw your consent at any time by emailing privacy@smarsh.com.

Applied ML in compliance

I have taken on a variety of roles in my career (analyst, data scientist, product manager) in a variety of domains (aerospace, defense, intelligence, financial services), but my work has always been in the applied space. I love the process of taking promising new technology out of the research lab and applying it in the field, where it can deliver value to real people working on real problems.

Inevitably, when new technology leaves the research lab and meets reality, limitations and risks are exposed. That is the fun part of working in the applied space! I get to refine the technologies so that they meet real-world requirements and then share those insights back into the research lab for further improvements.


In the FinTech space, I have seen this process play out with the introduction, adoption, and refinement of machine learning technologies in compliance monitoring. Initially, machine learning-based solutions were introduced to the compliance monitoring space as a replacement for existing lexicon-based solutions.

Undoubtedly, machine learning offered clear advantages over purely lexicon-based solutions, but it also came with some challenges. Coming out of the research lab, machine learning technologies were typically optimized for analytic quality and not necessarily designed to meet the needs of a regulatory environment.

For the past five years or so, applied machine learning engineers and compliance experts have worked together, identifying challenges and opportunities and iteratively refining machine learning-based solutions.

At Smarsh, we have coined the term "regulatory-grade AI" to describe this refined approach. We are now at an exciting milestone, where this years-long effort is culminating in our release of Enterprise Conduct. I would like to take this opportunity to review the advancements, challenges, and solutions that got us here.

The three generations of compliance analytics

Think of compliance monitoring analytics as having gone through three generations:

  1. Lexicon-based
  2. Machine learning-based
  3. Regulatory-grade AI

Let's explore how applied methods have evolved. 

featured img surveillance white camera

The lexicon approach

The first generation of conduct surveillance analytics relied on lexicons. Lexicons come in many shapes and sizes, from simple lists of keywords to complex collections of rules and patterns. Whichever form they take, they use an "if-then" approach to raising compliance alerts: if specific criteria appear in the text, then raise an alert. The criteria (i.e., keywords, patterns, rules, etc.) are all defined by human subject matter experts, who often maintain and refine these lexicons over a period of years.

The lexicon approach has its limitations, which generally fall into two categories:

Too many false positives. While not all lexicons are created equal (some are carefully crafted to have much better precision than others), the common complaint from compliance teams is that their lexicon-based policies alert on too much irrelevant content. It is a tough problem. Real violations are rare, so most alerts are likely to be false positives. However, compliance teams do not want to miss anything REAL, so they are reluctant to narrow the scope of their lexicons too much.

An inability to spot "unknown unknowns." A lexicon can only raise an alert if pre-defined criteria are met. In other words, a lexicon cannot "find" anything that a human has not explicitly told it to find. And a human cannot define criteria to cover a situation that they do not know about or do not anticipate. These "unknown" situations are not necessarily esoteric or rare. For instance, a misspelled or abbreviated word will foil a lexicon. Monitored employees will sometimes exploit this weakness intentionally, knowing that keyword-based monitoring is easily circumvented. Analysts can always resolve these gaps as they find them, by adding more complex rules to their lexicons. However, this maintenance process can be burdensome and the risk of the "unknown unknown" remains.

Back to top

featured img gamification darts

Machine learning

Then the research lab gave us the second generation of conduct surveillance analytics: machine learning.

In a way, machine learning models and lexicons are built for the same purpose. They both look at a given communication and decide whether to generate an alert. The difference comes in how they make this decision. A lexicon uses a set of human-defined criteria to make a yes/no decision. A machine learning model, on the other hand, generates its own set of criteria by looking at example data provided by humans.

The computer is WAY more effective at analyzing example data and creating optimized criteria than the human expert is (sorry, human), which means the machine learning model ends up with a MUCH better set of criteria than the lexicon does. For example:

  1. A better set of criteria yields better analytical results: fewer false positive alerts and fewer missed alerts compared to lexicons
  2. Due to its more nuanced criteria, the machine learning model isn't limited by yes/no decisions: it produces a probability score (0% to 100%), representing how likely the communication is to include a valid alert 

For instance, a given Bloomberg chat message might be scored as 25% likely to have a valid Market Manipulation alert in it, or an email might be scored as 85% likely to have a customer complaint in it. This scored approach enables users to widen or narrow the aperture of their surveillance or focus their attention on specific bands of risk probability.

This probability score approach, coupled with the overall better analytic results, goes a long way to addressing the major limitations posed by lexicons. Machine learning models are less susceptible to false positives and are not as easily foiled by the "unknown unknowns," such as abbreviations or misspellings.


All of this sounds great, and study after study demonstrates the benefits of a machine-learning approach over traditional lexicon-based methods. But what happens in practice when we take this machine learning technology from the lab and apply it in real-world compliance monitoring?

Adopting machine learning in place of the human-created lexicons requires a sort of trust-fall into the arms of machine learning. We are asking analysts to give up the lexicons they have carefully crafted over the years and instead trust a "black box" of machine learning.

The analysts can no longer define (or in some cases even see!) the criteria being used for alerting. And machine learning models can often produce unexpected results, leading analysts to wonder what the alerting criteria even is and undermining the trust the analyst has in the model. (The analyst is left thinking, "If the model made this mistake, what other mistakes might it make?")

This uncertainty is an uncomfortable place to be for an analyst working in a highly regulated environment where mistakes can have significant financial and legal impacts and the possibility of an audit is always looming.


Let us assume for the moment that we have overcome this hurdle of trust. The next step is to refine the machine model so that it meets the unique needs of the compliance department in which it is being deployed. Different organizations have different regulatory needs, different organizational cultures, different internal policies, etc. all of which need to be handled by the machine learning model.

Knowing that the model will not work exactly how users want it to out of the box, the traditional machine learning response is to add more example data to the model until it does learn the desired behavior. My colleagues at Smarsh have dubbed this the "monolithic model approach." Over the years, we have encountered challenges applying this approach in the compliance space.

I will illustrate these challenges using a model to detect secretive behavior as an example. Compliance teams commonly monitor employee communications for secrecy behaviors because those are highly correlated with conduct concerns. Simply put, if people are being secretive, it may mean they are doing something they should not.

Machine learning seems like an excellent choice to find secretive behavior within a language (think phrases like, "don't mention this to anyone" or "we can't let anyone know about this"). Machine learning greatly outperforms lexicons when it comes to finding this kind of nuanced human behavior in text.


However, when the model is applied to real-world problems, we begin to see some issues.

First, the secrecy model thinks that all email disclaimer language is a secret. Text like "The information contained in this email communication may be confidential. Do not distribute to unauthorized recipients" quite reasonably seems like a secret to our model. This misunderstanding poses a significant problem in the field where most emails contain some disclaimer language, potentially resulting in thousands of useless secrecy alerts every day.

Addressing this problem using a traditional machine learning approach (again, the "monolithic model" approach as we have begun calling it at Smarsh), involves teaching the model that we want it to find secrecy, but not disclaimer secrecy. To accomplish this, we might add a few dozen or a few hundred samples to teach the model the new behavior we want. Now, (hopefully) the disclaimer problem has gone away, and (hopefully) this work has not had any unintended consequences in our ability to find real secrecy.


The next problem the analysts might want to tackle is off-topic secrecy alerts. Alerts like, "it's a secret family recipe so I can't share" or "don't tell anyone I took the last cupcake, lol" are valid secretive language, but certainly not something that compliance teams are interested in seeing. How do we teach the model that we want secrecy but not that kind of secrecy? Again, using the monolithic model approach, we might decide to feed those examples into the model as negative examples, thereby teaching the model to look for secrecy, but not disclaimer secrecy and not secrecy around topics like cupcakes or family recipes or surprise parties or Secret Santa or the secret to shiny hair, etc.

You can see that what began as a simple, clear concept of secrecy is now becoming quite complex. As we make the task more complex, we introduce more risk into our solution. The model may become confused and have a reduced ability to find the secrecy behaviors we do want. The humans who are maintaining the model might also get confused. They might have trouble remembering what "counts" as secrecy when providing examples to the model. As a result, they might provide conflicting training examples to the model, which degrades the model further.

To be fair, even with these monolithic model problems, we do see the machine learning approach consistently outperforms the legacy lexicon-based solutions in terms of the quality of alerts. However, this process is still frustrating for users and doesn’t fully deliver on the promise of machine learning.

Finally, in the compliance domain, machine learning models are not "black and white" and deterministic like lexicons are. In a way, when it comes to compliance, machine learning's greatest strength is also its greatest weakness. Machine learning does not allow (or at least does not easily allow) an analyst to define deterministic "if-then" rules like they can with lexicons.

For instance, an analyst cannot tell a model, "If you ever encounter these exact words, always generate an alert." Instead, the machine learning model is going to learn its own rules and use them to determine probability scores. So, we might have a concerning email, on which an analyst would definitely want to see an alert, but the model might assign a 79% probability of an alert. If the analyst has their alerting threshold set at 80% they are not going to see an alert.

Users appreciate the unpredictability of machine learning models when it means that the model returns something interesting that they had not thought of before. But they do NOT like that unpredictability when it means the model might NOT return something that they want it to return.


In sum, machine learning models do offer analytic quality improvements over lexicon-based solutions, but those quality improvements do not negate users' frustration and concern about lack of control, lack of explainability, and not having the confidence to really know FOR SURE that the model will alert on certain phrases (all of which are critical in a regulated environment!)

Back to top

featured img ai computer brain

Regulatory-grade AI

So where does that leave us? How do we leverage the power of machine learning while at the same time providing the predictability, control, and explainability required in a regulatory environment?

At Smarsh, our answer is a new analytic framework that we call Scenarios. This approach represents all the insights we have learned in the field and the techniques we have developed in collaboration with compliance experts.

What we have done is decompose the problem into component parts. Instead of a monolithic model, we now leverage multiple components, each implemented with the most suitable approach, whether it is a machine learning model or a lexicon.

A Scenario looks for a combination of signals to raise an alert and allows users to leverage both lexicons and models together, joined with Boolean logic and managed in a no-code user interface.

Back to top

The Traditional Approach Smarsh Regulatory Grade AI Approach
Monolithic model that is difficult to adapt
  • One model detects multiple types of risk
  • New use cases require retraining the model
  • Retraining the model can reduce accuracy
Discrete models that are easily augmented
  • Discrete models detect specific risk types
  • Augmented with lexicons and filters in cognitive scenarios
  • New use cases handled by augmentation, not model retraining
  • Enables greater accuracy even as use cases expand
Challenging to explain to regulators
  • Each refinement changes the model in a unique way
  • Explaining multiple iterations is difficult to impossible
Easy to explain to regulators
  • Cognitive scenarios built by Smarsh
  • No model training required
  • Scenario refinements handled in augmentation layer
  • Each augmentation is easy to explain
  • Augmentation layers are based on field-proven uses
Cumbersome to maintain
  • Every retraining requires internal MRM audit
  • Audits are time consuming and costly
Easy to maintain
  • No re-verification with audit teams required
  • No need to pass model review boards repeatedly

Taking a step back, we know that a machine learning model is REALLY GOOD at finding nuanced human behavior in text, like the secrecy example above. What it is NOT good at is combining lots of other ideas (for instance, secrecy but not disclaimers, not secret cupcake eating, not surprise parties) into a single monolithic model. And it does not accommodate the idea of deterministic rules (meaning, there are some cases where analysts will ALWAYS want to see an alert).

With Scenarios, we can let the machine-learning model do what it does best (find secrecy language), and allow other components (such as lexicons, rules, or additional models) to take care of the other tasks (such as filtering out disclaimers and non-work-related topics).

The Scenario framework also gives analysts the ability to add their own deterministic rules and their own controllable, transparent lexicons, which they can layer on top of the models, giving them peace of mind that they will, for SURE, always get an alert when they KNOW they want one. Using this approach, we can leverage the power of machine learning while at the same time providing the predictability, control, and explainability required in a regulatory environment.

Scenarios offer a pragmatic approach that leverages every tool available to enable analysts and compliance teams to leverage AI/ML and effectively own their risk in the field while not encumbering them with the ongoing maintenance of ML models.

Back to top

A Scenario looks for a combination of signals to raise an alert and allows users to leverage both lexicons and models together, joined with Boolean logic and managed in a no-code user interface.

Elevate compliance and risk management with Smarsh

Artificial intelligence and machine learning is continuing to provide value to financial services and will only continue to evolve with a pragmatic approach to surfacing risk and maintaining compliance in financial institutions. Contact us for a deeper analysis of your supervision and surveillance needs, and a frank discussion of how Smarsh Enterprise Conduct can help your team maintain efficiency in your review process.

Back to top

Reveal risk, comply with regulations, protect against threats and safeguard your company’s reputation

Share this post!

Rebecca Graham
Latest posts by Rebecca Graham (see all)
    Smarsh Blog

    Our internal subject matter experts and our network of external industry experts are featured with insights into the technology and industry trends that affect your electronic communications compliance initiatives. Sign up to benefit from their deep understanding, tips and best practices regarding how your company can manage compliance risk while unlocking the business value of your communications data.

    Ready to enable compliant productivity?

    Join the 6,500+ customers using Smarsh to drive their business forward.

    Contact Us

    Tell us about yourself, and we’ll be in touch right away.