Skip to main content
< All Topics
Print

Purpose

The problem is here to test if in a VC firm, the data-driven process of startup dossier discovery and selection is unbiased and compliant with a declared principle of “fairness”. Apart from usual financial assessment, the data-driven selection is based on provided descriptions such as: value proposition, customer’s pain points and a list of top benefits for customers.

We propose here to replace the traditional cumbersome manual process of startup sourcing and screening by the use of a Machine Learning (ML) process based on a three steps process:

  • activity characterisation using a Natural Language Process (NLP) tagging system
  • followed by a K-means clustering algorithm capable of classifying the startups by their activity
  • and test if the selection/dismissal of their dossier is a “fair” process

 

 

Method

Preprocessing – Semantic tagging using the LSEG-PermID (Open Calais) service -> (https://github.com/MoiraCorp/Innovkg-exercise-km/tree/main/permid-preprocess)

Step1 – Load data and libraries in R -> (https://github.com/MoiraCorp/Innovkg-exercise-km/tree/main/step1)

Step2 – Perform Principal Component Analysis (PCA) and evaluate clustering potential in various factor scores subspaces -> (https://github.com/MoiraCorp/Innovkg-exercise-km/tree/main/step2)

Step3 – Perform K-means on a factor scores sub-space at evaluate performance with a minimum number of clusters -> (https://github.com/MoiraCorp/Innovkg-exercise-km/tree/main/step3)

Step4 – Display retained clusters statistics -> (https://github.com/MoiraCorp/Innovkg-exercise-km/tree/main/step4)

Step5 – Evaluate fairness of the startup dossier selection process -> (https://github.com/MoiraCorp/Innovkg-exercise-km/tree/main/step5)

Table of Contents