Compliance Testing – Fairness Assessment using R

Table of Contents

Retrieval Augmented Generation (RAG) augmented by ML can help in Proactive Risk Identification enabling predictive analysis to identify potential issues regarding unbalanced customer selection.

See how Retrieval Augmented Generation (RAG) augmented by ML can help in Proactive Risk Identification enabling predictive analysis to identify potential issues regarding unbalanced customer selection

Open code in R language on Github

The problem is here to test if in a VC firm, the data-driven process of startup dossier discovery and selection is unbiased and compliant with a declared principle of “fairness”. Apart from usual financial assessment, the data-driven selection is based on provided descriptions such as: value proposition, customer’s pain points and a list of top benefits for customers.

Using Semantic tagging using the LSEG-PermID (Open Calais) service, we propose to replace the traditional cumbersome manual process of startup companies sourcing and screening by the use of a Machine Learning (ML) process. Our open code on GitHub offers a step by step implementation in R language of the internal rating models approach presented in:

How it works in practice:

  1. Characterize their activity using a Natural Language Process (NLP) tagging system.
  2. followed by a K-means clustering algorithm capable of classifying the startups by their activity.
  3. Test if the selection/dismissal of their dossier is a “fair” process.

Wrestling with a similar regulatory or operational challenge?

We help regulated firms reduce the friction between what compliance requires and what teams actually have to do — through better processes first, AI where it earns its place. A 30-minute Business & Automation Review maps where your time is going and where automation could pay back fastest.

Related posts
Compliance Testing – Fairness Assessment using R
Retrieval Augmented Generation (RAG) augmented by ML can help in Proactive Risk Identification enabling predictive analysis to identify potential issues regarding unbalanced customer selection.
Company default prediction – DLMM internal rating model in R
Most firms are sitting on data that could predict which clients are at risk or which investments are underperforming. Machine learning is the type of artificial intelligence that enables computers to learn from this existing knowledge and data.
Behavioral & decision-making quantification
GenAI can adopt a persona and "make decisions" or "behave" in a way that can be quantified. This technique is used to simulate scenarios, which can then be analyzed quantitatively and used in particular to assess multi-criteria decision alternatives
Prompt for data
Extracting quantitative information using GenAI tools requires to properly structure the prompts used to question them to efficiently use their large language models (LLMs)
Machine learning augmentation: Closing the Data Gap
Machine learning is a type of artificial intelligence that enables computers to learn from existing knowledge and experiment results. These models are traditionally used for prediction and can be augmented by GenAI for training data generation and screening in particular
Retrieval augmented generation (RAG)
Retrieval Augmented Generation (RAG) is a critical technique using proprietary or domain specific documents to augment base LLMs to address specific enterprise or applications needs.