We will be using the dataset at https://github.com/propublica/compas-analysis/raw/master/compas-scores-two-years.csv. Reading it in:
set.seed(0)
compas <- read.csv("https://github.com/propublica/compas-analysis/raw/master/compas-scores-two-years.csv")
First, we’ll obtain the set we’ll be working with:
idx <- sample(1:nrow(compas))
idx.train <- idx[1:5000]
idx.test <- idx[5001:6000]
idx.valid <- idx[6001:7000]
compas.train <- compas[idx.train,]
compas.test <- compas[idx.test,]
compas.valid <- compas[idx.valid,]
We first explore whether white and black defendants get the same COMPAS scores.
compas.bw <- compas.train %>% filter(race %in% c("African-American", "Caucasian"))
ggplot(compas.bw, mapping = aes(x = decile_score)) +
geom_histogram(bins = 10, mapping = aes(y = ..density..)) +
scale_x_continuous(breaks = c(1:10)) +
facet_wrap(~race) +
xlab("Decile score")
For African-American defendants, the distribution of the scores is approximately uniform. For Caucasian defendants, many more get low scores than high scores.
Here, we are computing the FPR, FNR, and correct classification rate for different populations. First, we’ll define functions to compute the quantities needed.
get.FPR <- function(compas.set, thr){
return(sum((compas.set$decile_score >= thr) & (compas.set$is_recid == 0))/sum(compas.set$is_recid == 0))
}
get.FNR <- function(compas.set, thr){
return(sum((compas.set$decile_score < thr) & (compas.set$is_recid == 1))/sum(compas.set$is_recid == 1))
}
get.CCR <- function(compas.set, thr){
return(mean((compas.set$decile_score >= thr) == compas.set$is_recid))
}
Compute the sets we need:
compas.valid.b <- compas.valid %>% filter(race == "African-American")
compas.valid.w <- compas.valid %>% filter(race == "Caucasian")
Now, we can compute the numbers:
thr <- 5
fps <- c(get.FPR(compas.valid.b, 5), get.FPR(compas.valid.w, 5), get.FPR(compas.valid, 5))
fns <- c(get.FNR(compas.valid.b, 5), get.FNR(compas.valid.w, 5), get.FNR(compas.valid, 5))
ccr <- c(get.CCR(compas.valid.b, 5), get.CCR(compas.valid.w, 5), get.CCR(compas.valid, 5))
rates <- data.frame(FPR = fps, FNR = fns, CCR = ccr)
rownames(rates) = c("black", "white", "all")
rates
## FPR FNR CCR
## black 0.4686192 0.3176895 0.6124031
## white 0.2401961 0.5174825 0.6455331
## all 0.3182674 0.3965885 0.6450000
We can see that the scores do not satisfy false positive parity and do not satisfy false negative parity. The scores do satisfy classification parity. Demographic parity is also not satisfied.
We will now see how changing the threshold influences the false positive, false negative, and correct classification rates.
get.rates <- function(thr, compas.set){
return(c(get.FPR(compas.set, thr), get.FNR(compas.set, thr), get.CCR(compas.set, thr)))
}
thrs <- seq(0.5, 9.5, 1)
rates.w <- sapply(thrs, FUN=get.rates, compas.valid.w)
rates.b <- sapply(thrs, FUN=get.rates, compas.valid.b)
rates.all <- sapply(thrs, FUN=get.rates, compas.valid)
plot.rates <- function(thrs, rates, caption){
rates.df <- data.frame(threshold = thrs, FPR = rates[1,], FNR = rates[2,], CCR = rates[3,])
rates.df.tidy <- melt(rates.df, 1) %>% select(threshold = threshold, measure = variable, rate = value)
ggplot(rates.df.tidy, mapping = aes(x = threshold, y = rate, color = measure)) +
geom_smooth(se = F, method = "loess") + labs(title = caption)
}
We can now get three figures for the different demographics (let’s forego using facets again):
plot.rates(thrs, rates.w, "white defendants")
plot.rates(thrs, rates.b, "black defendants")
plot.rates(thrs, rates.all, "all defendants")
Let’s fit the model:
fit <- glm(is_recid ~ age + priors_count, family = binomial, data = compas.train)
fit$coefficients
## (Intercept) age priors_count
## 1.10503459 -0.05058575 0.16915555
An increase of 1 in the number of priors is associated with an increase of 0.17 in the log-odds of recidivism, all other things being equal.
An increase in age by one year corresponds to a decrease of 0.05 in the log-odds of recidivism.
(If we’re being a bit silly and decide to extrapolate) according to the model, a newborn with no priors would have a probability of \(\sigma(1.04) = 0.74\) of being re-arrested.
Let’s now obtain the FPR, FNR, and CCR for our model, using the threshold \(0.5\).
get.FPR.fit <- function(thr, compas.set, fit){
pred <- predict(fit, newdata = compas.set, type = "response") > thr
return(sum((pred == 1) & (compas.set$is_recid == 0))/sum(compas.set$is_recid == 0))
}
get.FNR.fit <- function(thr, compas.set, fit){
pred <- predict(fit, newdata = compas.set, type = "response") > thr
return(sum((pred == 0) & (compas.set$is_recid == 1))/sum(compas.set$is_recid == 1))
}
get.CCR.fit <- function(thr, compas.set, fit){
pred <- predict(fit, newdata = compas.set, type = "response") > thr
return(mean(pred == compas.set$is_recid))
}
compas.train.b <- compas.train %>% filter(race == "African-American")
compas.train.w <- compas.train %>% filter(race == "Caucasian")
compas.valid.b <- compas.valid %>% filter(race == "African-American")
compas.valid.w <- compas.valid %>% filter(race == "Caucasian")
thr <- .5
demo.perf.dat.train <- data.frame(FP = c(get.FPR.fit(thr, compas.train, fit),
get.FPR.fit(thr, compas.train.w, fit),
get.FPR.fit(thr, compas.train.b, fit)),
FN = c(get.FNR.fit(thr, compas.train, fit),
get.FNR.fit(thr, compas.train.w, fit),
get.FNR.fit(thr, compas.train.b, fit)),
CCR = c(get.CCR.fit(thr, compas.train, fit),
get.CCR.fit(thr, compas.train.w, fit),
get.CCR.fit(thr, compas.train.b, fit)))
demo.perf.dat.train
## FP FN CCR
## 1 0.2805728 0.3600993 0.6810000
## 2 0.2102564 0.5072046 0.6662672
## 3 0.3754355 0.2676450 0.6843738
demo.perf.dat.valid <- data.frame(FP = c(get.FPR.fit(thr, compas.valid, fit),
get.FPR.fit(thr, compas.valid.w, fit),
get.FPR.fit(thr, compas.valid.b, fit)),
FN = c(get.FNR.fit(thr, compas.valid, fit),
get.FNR.fit(thr, compas.valid.w, fit),
get.FNR.fit(thr, compas.valid.b, fit)),
CCR = c(get.CCR.fit(thr, compas.valid, fit),
get.CCR.fit(thr, compas.valid.w, fit),
get.CCR.fit(thr, compas.valid.b, fit)))
demo.perf.dat.valid
## FP FN CCR
## 1 0.2768362 0.3752665 0.6770000
## 2 0.1764706 0.5314685 0.6772334
## 3 0.4016736 0.2563177 0.6763566
It appears that there is basically no overfitting.
Let’s create the two hypothetical people and predict the probability of re-arrest for both:
dat = data.frame(age = c(30, 30), priors_count = c(2, 3))
predict(fit, newdata = dat, type = "response")
## 1 2
## 0.4814518 0.5237144
We observe an increase of about 3% in the probability.
We basically want to find the thresholds for which the false positive rates are at parity. Let’s see what the rates are for different thresholds.
thr <- c(0.3, 0.4, 0.51, 0.52, 0.53, 0.54, 0.55, 0.553, 0.555, 0.558, 0.56, 0.57, 0.58, 0.59, 0.7, 0.8)
fp.rates <- data.frame(thr = thr,
FP.b = sapply(thr, get.FPR.fit, compas.train.b, fit),
FP.w = sapply(thr, get.FPR.fit, compas.train.w, fit))
fp.rates
## thr FP.b FP.w
## 1 0.300 0.82926829 0.609230769
## 2 0.400 0.66811847 0.449230769
## 3 0.510 0.35365854 0.192820513
## 4 0.520 0.30313589 0.160000000
## 5 0.530 0.26742160 0.137435897
## 6 0.540 0.25348432 0.131282051
## 7 0.550 0.22560976 0.110769231
## 8 0.553 0.21080139 0.100512821
## 9 0.555 0.20993031 0.100512821
## 10 0.558 0.20209059 0.093333333
## 11 0.560 0.20034843 0.093333333
## 12 0.570 0.17857143 0.084102564
## 13 0.580 0.17073171 0.078974359
## 14 0.590 0.15766551 0.072820513
## 15 0.700 0.08101045 0.026666667
## 16 0.800 0.04181185 0.008205128
We need to tweak the threshold for black defendants just a little:
plot.FPR <- function(fp.rates){
colnames(fp.rates) <- c("thr", "African-American", "Caucasian")
df.melted <- melt(fp.rates, "thr")
colnames(df.melted) <- c("thr", "demographic", "FPR")
ggplot(df.melted) +
geom_smooth(mapping=aes(x = thr, y = FPR, color = demographic), method = "loess", se = F) +
geom_vline(xintercept = 0.5, linetype = "dashed") +
geom_vline(xintercept = 0.553, linetype = "dashed") +
geom_hline(yintercept = 0.211, linetype = "dashed") +
ggtitle('FPR across thresholds')
}
plot.FPR(fp.rates)
thr = 0.553
seems about right.
Now the white and black demographic would be at parity. We’ll compute the correct classification rate on the validation set.
n.correct <- sum(compas.train.b$is_recid == (predict(fit, newdata = compas.train.b, type = "response") > 0.553)) +
sum(compas.train.w$is_recid == (predict(fit, newdata = compas.train.w, type = "response") > 0.5))
n.total <- nrow(compas.train.b) + nrow(compas.train.w)
n.correct/n.total
## [1] 0.664548
(Note that we ignored everyone who wasn’t white or black. That’s OK to do, but including other demographics (in any way you like) is OK too).