Skip to main content
< All Topics
Print

Saving the group (class) vector table for future use

In the present example, the data tables is located in directory: C:\Projets_En_Cours\AI_MTPL__BasingHall\25-May-2021\Results

write.csv(grp,”C:/Projets_En_Cours/AI_MTPL/__BasingHall/25-May-2021/Results/grp.csv”, sep=”,”)

NOTE -> The group vector grp is exported and then modified for transfer of a few odd companies into of a group 6 for outliers

Comparing the Retained and Dismissed groups of companies for testing the “fairness” of the selection process

This experiment is meant to test the usability of the company classification based on their PermID “Intelligent Tagging” scores. It is using the section of the sections of the questionnaire named: “Status”. It is aiming at providing an answer to the question:
“Does it exist a bias in the way Basinghall Partners retains or dismisses a particular startup after it has submitted its statement of interest?”
This is what we have called “fairness analysis”. In other words, based on the classification presented above, is it possible to demonstrate that the selection of dossiers processed is an unbiased (i.d., fair) process?

Determining the amount of similiraties between the Retained and Dismissed groups through their PermID characterisation

In order to test this hypothesis, we have separated the population of startups in two sub-groups:

  • Dismissed: for Status, 21. Discontinued or 22. Dismissed
  • Retained: for all the other values of Status

We compare each of these classes through the statistical properties of their corresponding K-means sub-groups. This is done in the factor space (2,3)

# Composite plot of ellipses with status
grp <- as.factor(grp_modified$GrwithStat)
# Here we want to have the ellipses of Status=Dismissed in grey
# See deriving color palette for 9 classes : http://www.sthda.com/english/wiki/colors-in-r
fviz_pca_biplot(occ.pca, axes = c(2, 3),
habillage = grp,
palette = c(“#FF0099”, “#993FFF”, “#0066CC”, “#33CCCC”, “#009966”, “#AAAAAA”,”#AAAAAA”, “#AAAAAA”, “#AAAAAA”, “#AAAAAA”, “#AAAAAA”, “#AAAAAA”),
addEllipses = TRUE)

This diagram shows that for 5 groups, the “dispersion ellipses” of the Dismissed sub-groups (showed in grey) overlap almost perfectly with the “dispersion ellipses” of that of original population of companies.
From these results it is concluded that the Basing Hall Partners process is reasonably “fair”

It is also common to represent each groups by their average vector of category scores.

# Statitics for each group of companies
grp_stat <- describeBy(OCC_wStatus[,c(4:14,15:17)],grp_modified$Group, mat=TRUE)

# Statitics for each group of companies broke down by Dismissed-Retained categories
gpr_statDR <- describeBy(OCC_wStatus[,c(4:14,15:17)],grp_modified$GrwithStat, mat=TRUE)

In the above table the highest values category scores are presented in yellow. It shows that all the companies’ activities involve Internet with added specificities:

  • GR1: “Business-Finance”
  • GR2: “Hospitality-Recreation with financing”
  • GR3: “Helth_Pharmacy”
  • GR4: “Environment Agriculture”
  • GR5: “Sports”
Table of Contents