Keeping with the theme of my previous data visualization blog post, I’m going to use my Data Visualization Gutenberg Block plugin to create a summary of the data from the Mapping Police Violence project.
Here’s the finished visualization, continue below for a few quick thoughts on what the data is showing and a brief walkthrough of how I put the chart together.
Year | Killings | Charges | Convictions |
---|---|---|---|
2013 | 1,087 | 19 | 6 |
2014 | 1,049 | 20 | 11 |
2015 | 1,102 | 25 | 9 |
2016 | 1,070 | 19 | 5 |
2017 | 1,091 | 16 | 4 |
2018 | 1,144 | 14 | 1 |
2019 | 1,094 | 21 | 1 |
2020 | 1,125 | 16 | 1 |
Here are the basic steps for recreating the dataset, which you can also find in this GitHub repo.
- Download
MPVDatasetDownload.xlsx
from mappingpoliceviolence.org. - Open the file, remove unnecessary columns, export to
MPVDatasetDownload.csv
. - Run
python data-cleanup.py
to createdata-summary.csv
. - Upload the data using my Data Visualization Gutenberg Block plugin.
The cleanup script is pretty straightforward.
import csv
import json
# Helper function for retrieving the year from the dataset
def get_year( item ):
return item["Year"]
# Convert CSV files to a dictionary
data = {}
with open( "MPVDatasetDownload.csv", encoding = "utf-8" ) as csvf:
csvReader = csv.DictReader(csvf)
key = 0
for rows in csvReader:
key += 1
data[key] = rows
stats = []
for index in data:
year = "20" + data[index]["Date of Incident (month/day/year)"][-2:]
# Skip data for 2021 that's still in progress
if year != "2021":
years_processed = map( get_year, stats )
if year not in years_processed:
stats.append ( {
"Year": year,
"Killings": 0,
"Charges": 0,
"Convictions": 0
} )
current_index = len( stats ) - 1
else:
for i, item in enumerate( stats ):
if stats[i]["Year"] == year:
current_index = i
stats[current_index]["Killings"] += 1
if "Charged" in data[index]["Criminal Charges?"]:
stats[current_index]["Charges"] += 1
if "Charged, Convicted" in data[index]["Criminal Charges?"]:
stats[current_index]["Convictions"] += 1
stats.reverse()
print( json.dumps( stats, indent = 4, sort_keys=True ) )
# Save the dataset to a CSV file
data_file = open( "data-summary.csv", "w" )
csv_writer = csv.writer( data_file )
count = 0
for item in stats:
if count == 0:
header = item.keys()
csv_writer.writerow( header )
count += 1
csv_writer.writerow( item.values() )
data_file.close()
And here’s our CSV with just the data we need. I omitted data for 2021 to keep things a bit more clean, since we’re still very early into the year and what we have here is sufficient for our purpose of showing how incredibly rare it is to convict a cop for murder and how much effort it takes to do that.
Year,Killings,Charges,Convictions
2013,1087,19,6
2014,1049,20,11
2015,1102,25,9
2016,1070,19,5
2017,1091,16,4
2018,1144,14,1
2019,1094,21,1
2020,1125,16,1
Once we have our dataset ready, using the Data Visualization Gutenberg Block plugin is pretty simple. To make the final visualization more interesting, and to show that there are real people behind these numbers, I added a new option to my plugin that lets you create animated borders.
