This is the methodology used for the Trend CT story: Who, where and how often people are stunned by police in Connecticut

Visit the repo for the data used in this analysis. (Also, check out the reproducible scripts and data behind many of our other stories in our central repo)

Data for this analysis was provided by CCSU’s Institute for Municipal and Regional Policy, compiled from reports gathered by individual police departments as mandated by Connecticut law. However, this is a new form of data collection so quality of data varies department to department.

What’s in this script

Several visualizations and tables exploring the data

Bonus: As an interesting exercise, my colleague Jake Kara, also worked did a data analysis for story but with Python.

We livestreamed our process. Jake with Python [video]. Me with R (Sorry, I didn’t realize I was on mute for the first part) [video].

Preparing the data

stuns <- read_excel("data/2015 Reported Taser Data.xlsx", sheet=1)

# You can also download the data here

stuns[1,] <- ifelse([1,]), colnames(stuns), stuns[1,])
colnames(stuns) <- stuns[1,]

stuns <- stuns[-1,]

colnames(stuns) <- make.names(colnames(stuns))

# another stream but for python data analysis going on right now:


# Cleaning up the data

colnames(stuns) <- c("Law.Enforcement.Agency",

stuns$race_ethnicity <- ifelse(stuns$Hispanic==1, "Hispanic", stuns$Race)

Stun incidents by race in the state

by_state <- stuns %>%
  group_by(race_ethnicity) %>%
  summarise(total=n()) %>%
race_ethnicity total percent
Asian 2 0.33
Black 187 30.66
Hispanic 130 21.31
Unknown 1 0.16
White 290 47.54

Total stun incidents per department


by_dept_total <- stuns %>%
  group_by(Law.Enforcement.Agency, race_ethnicity) %>%
  summarise(total=n()) %>%
  spread(race_ethnicity, total)


Stun incidents per department by race

by_dept_percent <- stuns %>%
  group_by(Law.Enforcement.Agency, race_ethnicity) %>%
  summarise(total=n()) %>%
  mutate(percent=round(total/sum(total)*100,2)) %>%
  select(Law.Enforcement.Agency, race_ethnicity, percent) %>%
  spread(race_ethnicity, percent)


Time of stun incidents

stuns$Time.of.Incident <- convertToDateTime(as.numeric(stuns$Time.of.Incident), origin = "2016-07-04")

## Attaching package: 'lubridate'
## The following object is masked from 'package:base':
##     date
stuns$hour <- hour(stuns$Time.of.Incident)

## Registering fonts with R

ggplot(stuns, aes(hour)) + geom_histogram(binwidth=1)
## Warning: Removed 2 rows containing non-finite values (stat_bin).

# Time of stun incidents by race

ggplot(stuns, aes(hour, fill=race_ethnicity)) + geom_histogram(binwidth=1)
## Warning: Removed 2 rows containing non-finite values (stat_bin).


stuns$Date.of.Incident <- as.POSIXct(as.numeric(stuns$Date.of.Incident) * (60*60*24)
                                     , origin="1899-12-30"
                                     , tz="GMT")

stuns$month <- month(stuns$Date.of.Incident, label=TRUE)

ggplot(stuns, aes(month)) + geom_bar()