Compliance Testing - Fairness Assessment using R
- Compliance Testing - Fairness Assessment using R
- Preprocessing - Semantic tagging using the LSEG-PermID (Open Calais) service
- Step1 - Load data and libraries
- Step 2 – Perform Principal Component Analysis (PCA) and evaluate clustering potential
- Step 3 - Perform K-means on a factor scores sub-space at evaluate performance with a minimum number of clusters
- Step 4 - Display retained clusters statistics
- Step5 - Evaluate fairness of Basing Hall-BIC selection process
The problem is here to test if in a VC firm, the data-driven process of startup dossier discovery and selection is unbiased and compliant with a declared principle of “fairness”. Apart from usual financial assessment, the data-driven selection is based on provided descriptions such as: value proposition, customer’s pain points and a list of top benefits for customers.
We propose here to replace the traditional cumbersome manual process of startup sourcing and screening by the use of a Machine Learning (ML) process based on a three steps process:
- activity characterisation using a Natural Language Process (NLP) tagging system
- followed by a K-means clustering algorithm capable of classifying the startups by their activity
- and test if the selection/dismissal of their dossier is a “fair” process
Preprocessing – Semantic tagging using the LSEG-PermID (Open Calais) service -> (https://github.com/MoiraCorp/Innovkg-exercise-km/tree/main/permid-preprocess)
Step1 – Load data and libraries in R -> (https://github.com/MoiraCorp/Innovkg-exercise-km/tree/main/step1)
Step2 – Perform Principal Component Analysis (PCA) and evaluate clustering potential in various factor scores subspaces -> (https://github.com/MoiraCorp/Innovkg-exercise-km/tree/main/step2)
Step3 – Perform K-means on a factor scores sub-space at evaluate performance with a minimum number of clusters -> (https://github.com/MoiraCorp/Innovkg-exercise-km/tree/main/step3)
Step4 – Display retained clusters statistics -> (https://github.com/MoiraCorp/Innovkg-exercise-km/tree/main/step4)
Step5 – Evaluate fairness of the startup dossier selection process -> (https://github.com/MoiraCorp/Innovkg-exercise-km/tree/main/step5)