Retrieval Augmented Generation (RAG) augmented by ML can help in Proactive Risk Identification enabling predictive analysis to identify potential issues regarding unbalanced customer selection.
See how Retrieval Augmented Generation (RAG) augmented by ML can help in Proactive Risk Identification enabling predictive analysis to identify potential issues regarding unbalanced customer selection
Open code in R language on Github
The problem is here to test if in a VC firm, the data-driven process of startup dossier discovery and selection is unbiased and compliant with a declared principle of “fairness”. Apart from usual financial assessment, the data-driven selection is based on provided descriptions such as: value proposition, customer’s pain points and a list of top benefits for customers.
Using Semantic tagging using the LSEG-PermID (Open Calais) service, we propose to replace the traditional cumbersome manual process of startup companies sourcing and screening by the use of a Machine Learning (ML) process. Our open code on GitHub offers a step by step implementation in R language of the internal rating models approach presented in:
How it works in practice:
- Characterize their activity using a Natural Language Process (NLP) tagging system.
- followed by a K-means clustering algorithm capable of classifying the startups by their activity.
- Test if the selection/dismissal of their dossier is a “fair” process.