Exploring the Mapping Police Violence dataset

Get data on GitHub

Keeping with the theme of my previous data visualization blog post, I’m going to use my Data Visualization Gutenberg Block plugin to create a summary of the data from the Mapping Police Violence project.

Here’s the finished visualization, continue below for a few quick thoughts on what the data is showing and a brief walkthrough of how I put the chart together.

Phet Gouvonvong, Andrew Brown, Ma’Khia Bryant, Doward Sylleen Baker, Antonio Cantu, Edgar Luis Tirado, Bradley Michael Olsen, Larry Jenkins, Smoky Lynn Crockett, Matthew Harry, Robert Douglas Delgado, Alex Garcia, Jacob Wood, Marcelo Garcia, Lindani Myeni, Peyton Ham, Christopher Templo Marquez, Anthony Thompson Jr., Pier Alexander Shelton, Matthew Zadok Williams, Miles Jackson, Daunte Wright, Faustin Guetigo, Joshua Mitchell, Rescue Eram, Joshua Michael Johnson, Douglas Barton, DeShund Tanner, David William Determan, Paige Pierce Schmidt, Dominique Williams, James Alexander, James Lionel Johnson, Roger Cornelius Allen, Devin Wyteagle Kuykendall, Stephanie Nicole Volkin, Tyler R. Green, Iremamber Sykap, Roy Kenneth Jackel Jr., Silas Lambert, Desmon Montez Ray, Jose Arenas, Diwone Wallace, Gabriel Casso, Jeffrey W. Appelt, Juan Carlos Estrada, Samuel Yeager, Angel Nelson, Natzeryt Layahoshua Viertel, Noah R. Green, DeShawn Latiwon Tatum, James Andrew Iler, Steven Ross Glass, Willie Roy Allen, Aaron Christopher Pouche, Anthony Alvarez, Ivan Cuevas, Jeffrey Ely, Lance Montgomery Powell
YearKillingsChargesConvictions
20131,087196
20141,0492011
20151,102259
20161,070195
20171,091164
20181,144141
20191,094211
20201,125161

Here are the basic steps for recreating the dataset, which you can also find in this GitHub repo.

  1. Download MPVDatasetDownload.xlsx from mappingpoliceviolence.org.
  2. Open the file, remove unnecessary columns, export to MPVDatasetDownload.csv.
  3. Run python data-cleanup.py to create data-summary.csv.
  4. Upload the data using my Data Visualization Gutenberg Block plugin.

The cleanup script is pretty straightforward.

import csv
import json

# Helper function for retrieving the year from the dataset
def get_year( item ):
    return item["Year"]

# Convert CSV files to a dictionary

data = {}

with open( "MPVDatasetDownload.csv", encoding = "utf-8" ) as csvf:
    csvReader = csv.DictReader(csvf)
    key = 0
    for rows in csvReader:
        key += 1
        data[key] = rows

stats = []

for index in data:
    year = "20" + data[index]["Date of Incident (month/day/year)"][-2:]

    # Skip data for 2021 that's still in progress

    if year != "2021":
        years_processed = map( get_year, stats )

        if year not in years_processed:
            stats.append ( {
                "Year": year,
                "Killings": 0,
                "Charges": 0,
                "Convictions": 0
            } )

            current_index = len( stats ) - 1
        else:
            for i, item in enumerate( stats ):
                if stats[i]["Year"] == year:
                    current_index = i

        stats[current_index]["Killings"] += 1

        if "Charged" in data[index]["Criminal Charges?"]:
            stats[current_index]["Charges"] += 1

        if "Charged, Convicted" in data[index]["Criminal Charges?"]:
            stats[current_index]["Convictions"] += 1

stats.reverse()

print( json.dumps( stats, indent = 4, sort_keys=True ) )

# Save the dataset to a CSV file
    
data_file = open( "data-summary.csv", "w" )
csv_writer = csv.writer( data_file )
count = 0

for item in stats:
    if count == 0:
        header = item.keys()
        csv_writer.writerow( header )
        count += 1
  
    csv_writer.writerow( item.values() )
  
data_file.close()

And here’s our CSV with just the data we need. I omitted data for 2021 to keep things a bit more clean, since we’re still very early into the year and what we have here is sufficient for our purpose of showing how incredibly rare it is to convict a cop for murder and how much effort it takes to do that.

Year,Killings,Charges,Convictions
2013,1087,19,6
2014,1049,20,11
2015,1102,25,9
2016,1070,19,5
2017,1091,16,4
2018,1144,14,1
2019,1094,21,1
2020,1125,16,1

Once we have our dataset ready, using the Data Visualization Gutenberg Block plugin is pretty simple. To make the final visualization more interesting, and to show that there are real people behind these numbers, I added a new option to my plugin that lets you create animated borders.

A screenshot of the Text Border field that lets you add animated text border to your data visualization.

And to wrap up this blog post, let me share a few useful resources on police accountability and racial justice:

More from the blog

A tinted screenshot showing domains where I have followers and follow accounts blocked by one of Mastodon's instances.
A tinted, zoomed in screenshot of a JSON object showing server information about a Mastodon instance.
A tinted screenshot of an unlabeled scatter chart, with most data points grouped on the right.

📖 Visit blog