{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "## Practical Example\n", "\n", "Let's explore a practical example: what if we wanted to know how common homicide was in each state in the United States and identify those places where per capita homicide lowest and highest. To start with, this means we'll need some data, so we'll use data from the [U.S. Centers for Disease Control and Prevention](https://www.cdc.gov/nchs/pressroom/sosmap/homicide_mortality/homicide.htm) for homicide totals by state and data from the [U.S. Census Bureau](https://data.census.gov/cedsci/table?tid=PEPPOP2021.NST_EST2021_POP&hidePreview=false) to get population in 2020. Let's read those files into dataframes and see what we've got!\n", "\n", "*Note: we've removed some extraneous columns to make the example clearer*" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
NAMEPOPESTIMATE2020
0United States331501080
1Northeast Region57525633
2Midwest Region68935174
3South Region126409007
4West Region78631266
5Alabama5024803
6Alaska732441
7Arizona7177986
8Arkansas3012232
9California39499738
10Colorado5784308
11Connecticut3600260
12Delaware991886
13District of Columbia690093
14Florida21569932
15Georgia10725800
16Hawaii1451911
17Idaho1847772
18Illinois12785245
19Indiana6785644
20Iowa3188669
21Kansas2935880
22Kentucky4503958
23Louisiana4651203
24Maine1362280
25Maryland6172679
26Massachusetts7022220
27Michigan10067664
28Minnesota5707165
29Mississippi2956870
30Missouri6154481
31Montana1086193
32Nebraska1961455
33Nevada3114071
34New Hampshire1377848
35New Jersey9279743
36New Mexico2117566
37New York20154933
38North Carolina10457177
39North Dakota778962
40Ohio11790587
41Oklahoma3962031
42Oregon4241544
43Pennsylvania12989625
44Rhode Island1096229
45South Carolina5130729
46South Dakota887099
47Tennessee6920119
48Texas29217653
49Utah3281684
50Vermont642495
51Virginia8632044
52Washington7718785
53West Virginia1789798
54Wisconsin5892323
55Wyoming577267
56Puerto Rico3281538
\n", "
" ], "text/plain": [ " NAME POPESTIMATE2020\n", "0 United States 331501080\n", "1 Northeast Region 57525633\n", "2 Midwest Region 68935174\n", "3 South Region 126409007\n", "4 West Region 78631266\n", "5 Alabama 5024803\n", "6 Alaska 732441\n", "7 Arizona 7177986\n", "8 Arkansas 3012232\n", "9 California 39499738\n", "10 Colorado 5784308\n", "11 Connecticut 3600260\n", "12 Delaware 991886\n", "13 District of Columbia 690093\n", "14 Florida 21569932\n", "15 Georgia 10725800\n", "16 Hawaii 1451911\n", "17 Idaho 1847772\n", "18 Illinois 12785245\n", "19 Indiana 6785644\n", "20 Iowa 3188669\n", "21 Kansas 2935880\n", "22 Kentucky 4503958\n", "23 Louisiana 4651203\n", "24 Maine 1362280\n", "25 Maryland 6172679\n", "26 Massachusetts 7022220\n", "27 Michigan 10067664\n", "28 Minnesota 5707165\n", "29 Mississippi 2956870\n", "30 Missouri 6154481\n", "31 Montana 1086193\n", "32 Nebraska 1961455\n", "33 Nevada 3114071\n", "34 New Hampshire 1377848\n", "35 New Jersey 9279743\n", "36 New Mexico 2117566\n", "37 New York 20154933\n", "38 North Carolina 10457177\n", "39 North Dakota 778962\n", "40 Ohio 11790587\n", "41 Oklahoma 3962031\n", "42 Oregon 4241544\n", "43 Pennsylvania 12989625\n", "44 Rhode Island 1096229\n", "45 South Carolina 5130729\n", "46 South Dakota 887099\n", "47 Tennessee 6920119\n", "48 Texas 29217653\n", "49 Utah 3281684\n", "50 Vermont 642495\n", "51 Virginia 8632044\n", "52 Washington 7718785\n", "53 West Virginia 1789798\n", "54 Wisconsin 5892323\n", "55 Wyoming 577267\n", "56 Puerto Rico 3281538" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import pandas as pd\n", "\n", "pd.set_option(\"mode.copy_on_write\", True)\n", "\n", "population = pd.read_csv(\"data/population.csv\")\n", "population" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
STATEDEATHS
0AL654
1AK55
2AZ525
3AR371
4CA2368
5CO336
6CT152
7DE86
8FL1530
9GA1093
10HI46
11ID42
12IL1353
13IN620
14IA106
15KS195
16KY404
17LA873
18ME21
19MD649
20MA183
21MI811
22MN198
23MS576
24MO803
25MT65
26NE76
27NV217
28NH14
29NJ362
30NM216
31NY875
32NC884
33ND30
34OH1004
35OK342
36OR157
37PA994
38RI29
39SC622
40SD52
41TN753
42TX2212
43UT95
44VT14
45VA531
46WA322
47WV114
48WI334
49WY25
\n", "
" ], "text/plain": [ " STATE DEATHS\n", "0 AL 654\n", "1 AK 55\n", "2 AZ 525\n", "3 AR 371\n", "4 CA 2368\n", "5 CO 336\n", "6 CT 152\n", "7 DE 86\n", "8 FL 1530\n", "9 GA 1093\n", "10 HI 46\n", "11 ID 42\n", "12 IL 1353\n", "13 IN 620\n", "14 IA 106\n", "15 KS 195\n", "16 KY 404\n", "17 LA 873\n", "18 ME 21\n", "19 MD 649\n", "20 MA 183\n", "21 MI 811\n", "22 MN 198\n", "23 MS 576\n", "24 MO 803\n", "25 MT 65\n", "26 NE 76\n", "27 NV 217\n", "28 NH 14\n", "29 NJ 362\n", "30 NM 216\n", "31 NY 875\n", "32 NC 884\n", "33 ND 30\n", "34 OH 1004\n", "35 OK 342\n", "36 OR 157\n", "37 PA 994\n", "38 RI 29\n", "39 SC 622\n", "40 SD 52\n", "41 TN 753\n", "42 TX 2212\n", "43 UT 95\n", "44 VT 14\n", "45 VA 531\n", "46 WA 322\n", "47 WV 114\n", "48 WI 334\n", "49 WY 25" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "homicides = pd.read_csv(\"data/homicides.csv\")\n", "homicides" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "There are two items to note here. The first is that the population data has the full state name listed, while the homicide data only has state abbreviations. The second is that while the homicide data has 50 states, the population data has 57 since there are additional regions listed (e.g. \"United States\", \"Northeast Region\", etc.).\n", "\n", "What we'll want to do, is to add in the state names for each state in the homicide data, then merge the two datasets into one. Once we have that combined dataset, we can calculate the per capita homicide rate in each state and sort our data to determine the highest/lowest per capita.\n", "\n", "To add in the state names for each state, let's load in a dataset that contains the 50 states and their abbreviations:" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
StateCode
0AlabamaAL
1AlaskaAK
2ArizonaAZ
3ArkansasAR
4CaliforniaCA
5ColoradoCO
6ConnecticutCT
7DelawareDE
8District of ColumbiaDC
9FloridaFL
10GeorgiaGA
11HawaiiHI
12IdahoID
13IllinoisIL
14IndianaIN
15IowaIA
16KansasKS
17KentuckyKY
18LouisianaLA
19MaineME
20MarylandMD
21MassachusettsMA
22MichiganMI
23MinnesotaMN
24MississippiMS
25MissouriMO
26MontanaMT
27NebraskaNE
28NevadaNV
29New HampshireNH
30New JerseyNJ
31New MexicoNM
32New YorkNY
33North CarolinaNC
34North DakotaND
35OhioOH
36OklahomaOK
37OregonOR
38PennsylvaniaPA
39Rhode IslandRI
40South CarolinaSC
41South DakotaSD
42TennesseeTN
43TexasTX
44UtahUT
45VermontVT
46VirginiaVA
47WashingtonWA
48West VirginiaWV
49WisconsinWI
50WyomingWY
\n", "
" ], "text/plain": [ " State Code\n", "0 Alabama AL\n", "1 Alaska AK\n", "2 Arizona AZ\n", "3 Arkansas AR\n", "4 California CA\n", "5 Colorado CO\n", "6 Connecticut CT\n", "7 Delaware DE\n", "8 District of Columbia DC\n", "9 Florida FL\n", "10 Georgia GA\n", "11 Hawaii HI\n", "12 Idaho ID\n", "13 Illinois IL\n", "14 Indiana IN\n", "15 Iowa IA\n", "16 Kansas KS\n", "17 Kentucky KY\n", "18 Louisiana LA\n", "19 Maine ME\n", "20 Maryland MD\n", "21 Massachusetts MA\n", "22 Michigan MI\n", "23 Minnesota MN\n", "24 Mississippi MS\n", "25 Missouri MO\n", "26 Montana MT\n", "27 Nebraska NE\n", "28 Nevada NV\n", "29 New Hampshire NH\n", "30 New Jersey NJ\n", "31 New Mexico NM\n", "32 New York NY\n", "33 North Carolina NC\n", "34 North Dakota ND\n", "35 Ohio OH\n", "36 Oklahoma OK\n", "37 Oregon OR\n", "38 Pennsylvania PA\n", "39 Rhode Island RI\n", "40 South Carolina SC\n", "41 South Dakota SD\n", "42 Tennessee TN\n", "43 Texas TX\n", "44 Utah UT\n", "45 Vermont VT\n", "46 Virginia VA\n", "47 Washington WA\n", "48 West Virginia WV\n", "49 Wisconsin WI\n", "50 Wyoming WY" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "names = pd.read_csv(\"data/state_codes.csv\")\n", "names" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We see another issue here, though: there are 51 states listed with abbreviations (remember, Python starts counting at 0) since the District of Columbia is included while there are 50 states listed in the `homicides` DataFrame. Let's left-merge the data so we only keep the 50 states for which we have data. Since the column in `homicides` that has the labels is labeled 'STATE' and the column in `names` with the state labels is called 'Code', we'll use the 'left_on' and 'right_on' parameters to specify those columns:" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
STATEDEATHSStateCode
0AL654AlabamaAL
1AK55AlaskaAK
2AZ525ArizonaAZ
3AR371ArkansasAR
4CA2368CaliforniaCA
5CO336ColoradoCO
6CT152ConnecticutCT
7DE86DelawareDE
8FL1530FloridaFL
9GA1093GeorgiaGA
10HI46HawaiiHI
11ID42IdahoID
12IL1353IllinoisIL
13IN620IndianaIN
14IA106IowaIA
15KS195KansasKS
16KY404KentuckyKY
17LA873LouisianaLA
18ME21MaineME
19MD649MarylandMD
20MA183MassachusettsMA
21MI811MichiganMI
22MN198MinnesotaMN
23MS576MississippiMS
24MO803MissouriMO
25MT65MontanaMT
26NE76NebraskaNE
27NV217NevadaNV
28NH14New HampshireNH
29NJ362New JerseyNJ
30NM216New MexicoNM
31NY875New YorkNY
32NC884North CarolinaNC
33ND30North DakotaND
34OH1004OhioOH
35OK342OklahomaOK
36OR157OregonOR
37PA994PennsylvaniaPA
38RI29Rhode IslandRI
39SC622South CarolinaSC
40SD52South DakotaSD
41TN753TennesseeTN
42TX2212TexasTX
43UT95UtahUT
44VT14VermontVT
45VA531VirginiaVA
46WA322WashingtonWA
47WV114West VirginiaWV
48WI334WisconsinWI
49WY25WyomingWY
\n", "
" ], "text/plain": [ " STATE DEATHS State Code\n", "0 AL 654 Alabama AL\n", "1 AK 55 Alaska AK\n", "2 AZ 525 Arizona AZ\n", "3 AR 371 Arkansas AR\n", "4 CA 2368 California CA\n", "5 CO 336 Colorado CO\n", "6 CT 152 Connecticut CT\n", "7 DE 86 Delaware DE\n", "8 FL 1530 Florida FL\n", "9 GA 1093 Georgia GA\n", "10 HI 46 Hawaii HI\n", "11 ID 42 Idaho ID\n", "12 IL 1353 Illinois IL\n", "13 IN 620 Indiana IN\n", "14 IA 106 Iowa IA\n", "15 KS 195 Kansas KS\n", "16 KY 404 Kentucky KY\n", "17 LA 873 Louisiana LA\n", "18 ME 21 Maine ME\n", "19 MD 649 Maryland MD\n", "20 MA 183 Massachusetts MA\n", "21 MI 811 Michigan MI\n", "22 MN 198 Minnesota MN\n", "23 MS 576 Mississippi MS\n", "24 MO 803 Missouri MO\n", "25 MT 65 Montana MT\n", "26 NE 76 Nebraska NE\n", "27 NV 217 Nevada NV\n", "28 NH 14 New Hampshire NH\n", "29 NJ 362 New Jersey NJ\n", "30 NM 216 New Mexico NM\n", "31 NY 875 New York NY\n", "32 NC 884 North Carolina NC\n", "33 ND 30 North Dakota ND\n", "34 OH 1004 Ohio OH\n", "35 OK 342 Oklahoma OK\n", "36 OR 157 Oregon OR\n", "37 PA 994 Pennsylvania PA\n", "38 RI 29 Rhode Island RI\n", "39 SC 622 South Carolina SC\n", "40 SD 52 South Dakota SD\n", "41 TN 753 Tennessee TN\n", "42 TX 2212 Texas TX\n", "43 UT 95 Utah UT\n", "44 VT 14 Vermont VT\n", "45 VA 531 Virginia VA\n", "46 WA 322 Washington WA\n", "47 WV 114 West Virginia WV\n", "48 WI 334 Wisconsin WI\n", "49 WY 25 Wyoming WY" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "homicides_with_states = pd.merge(\n", " homicides, names, left_on=\"STATE\", right_on=\"Code\", how=\"left\"\n", ")\n", "homicides_with_states" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we have those items combined, but we don't need the \"STATE\" or \"Code\" columns, so let's drop them using the `drop` method. We'll do this operation in place so that rather than having to write:\n", "\n", "`homicides3 = homicides_with_states.drop(columns=['STATE','Code'])`\n", "\n", "which leads to a proliferation of variables each time we want to do a new operation (`homicide3`, `homicide4`, etc.); instead we can perform the operation directly on the data using the `inplace=True` keyword." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
DEATHSState
0654Alabama
155Alaska
2525Arizona
3371Arkansas
42368California
5336Colorado
6152Connecticut
786Delaware
81530Florida
91093Georgia
1046Hawaii
1142Idaho
121353Illinois
13620Indiana
14106Iowa
15195Kansas
16404Kentucky
17873Louisiana
1821Maine
19649Maryland
20183Massachusetts
21811Michigan
22198Minnesota
23576Mississippi
24803Missouri
2565Montana
2676Nebraska
27217Nevada
2814New Hampshire
29362New Jersey
30216New Mexico
31875New York
32884North Carolina
3330North Dakota
341004Ohio
35342Oklahoma
36157Oregon
37994Pennsylvania
3829Rhode Island
39622South Carolina
4052South Dakota
41753Tennessee
422212Texas
4395Utah
4414Vermont
45531Virginia
46322Washington
47114West Virginia
48334Wisconsin
4925Wyoming
\n", "
" ], "text/plain": [ " DEATHS State\n", "0 654 Alabama\n", "1 55 Alaska\n", "2 525 Arizona\n", "3 371 Arkansas\n", "4 2368 California\n", "5 336 Colorado\n", "6 152 Connecticut\n", "7 86 Delaware\n", "8 1530 Florida\n", "9 1093 Georgia\n", "10 46 Hawaii\n", "11 42 Idaho\n", "12 1353 Illinois\n", "13 620 Indiana\n", "14 106 Iowa\n", "15 195 Kansas\n", "16 404 Kentucky\n", "17 873 Louisiana\n", "18 21 Maine\n", "19 649 Maryland\n", "20 183 Massachusetts\n", "21 811 Michigan\n", "22 198 Minnesota\n", "23 576 Mississippi\n", "24 803 Missouri\n", "25 65 Montana\n", "26 76 Nebraska\n", "27 217 Nevada\n", "28 14 New Hampshire\n", "29 362 New Jersey\n", "30 216 New Mexico\n", "31 875 New York\n", "32 884 North Carolina\n", "33 30 North Dakota\n", "34 1004 Ohio\n", "35 342 Oklahoma\n", "36 157 Oregon\n", "37 994 Pennsylvania\n", "38 29 Rhode Island\n", "39 622 South Carolina\n", "40 52 South Dakota\n", "41 753 Tennessee\n", "42 2212 Texas\n", "43 95 Utah\n", "44 14 Vermont\n", "45 531 Virginia\n", "46 322 Washington\n", "47 114 West Virginia\n", "48 334 Wisconsin\n", "49 25 Wyoming" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "homicides_with_states = homicides_with_states.drop(columns=[\"STATE\", \"Code\"])\n", "homicides_with_states" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And just because it's easier to reference cleaned and ordered labels, let's rename 'DEATHS' to 'homicides' using `rename`." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
homicidesstate
0654Alabama
155Alaska
2525Arizona
3371Arkansas
42368California
5336Colorado
6152Connecticut
786Delaware
81530Florida
91093Georgia
1046Hawaii
1142Idaho
121353Illinois
13620Indiana
14106Iowa
15195Kansas
16404Kentucky
17873Louisiana
1821Maine
19649Maryland
20183Massachusetts
21811Michigan
22198Minnesota
23576Mississippi
24803Missouri
2565Montana
2676Nebraska
27217Nevada
2814New Hampshire
29362New Jersey
30216New Mexico
31875New York
32884North Carolina
3330North Dakota
341004Ohio
35342Oklahoma
36157Oregon
37994Pennsylvania
3829Rhode Island
39622South Carolina
4052South Dakota
41753Tennessee
422212Texas
4395Utah
4414Vermont
45531Virginia
46322Washington
47114West Virginia
48334Wisconsin
4925Wyoming
\n", "
" ], "text/plain": [ " homicides state\n", "0 654 Alabama\n", "1 55 Alaska\n", "2 525 Arizona\n", "3 371 Arkansas\n", "4 2368 California\n", "5 336 Colorado\n", "6 152 Connecticut\n", "7 86 Delaware\n", "8 1530 Florida\n", "9 1093 Georgia\n", "10 46 Hawaii\n", "11 42 Idaho\n", "12 1353 Illinois\n", "13 620 Indiana\n", "14 106 Iowa\n", "15 195 Kansas\n", "16 404 Kentucky\n", "17 873 Louisiana\n", "18 21 Maine\n", "19 649 Maryland\n", "20 183 Massachusetts\n", "21 811 Michigan\n", "22 198 Minnesota\n", "23 576 Mississippi\n", "24 803 Missouri\n", "25 65 Montana\n", "26 76 Nebraska\n", "27 217 Nevada\n", "28 14 New Hampshire\n", "29 362 New Jersey\n", "30 216 New Mexico\n", "31 875 New York\n", "32 884 North Carolina\n", "33 30 North Dakota\n", "34 1004 Ohio\n", "35 342 Oklahoma\n", "36 157 Oregon\n", "37 994 Pennsylvania\n", "38 29 Rhode Island\n", "39 622 South Carolina\n", "40 52 South Dakota\n", "41 753 Tennessee\n", "42 2212 Texas\n", "43 95 Utah\n", "44 14 Vermont\n", "45 531 Virginia\n", "46 322 Washington\n", "47 114 West Virginia\n", "48 334 Wisconsin\n", "49 25 Wyoming" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "homicides_with_states = homicides_with_states.rename(\n", " columns={\"DEATHS\": \"homicides\", \"State\": \"state\"}\n", ")\n", "homicides_with_states" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Great - now we're ready to merge in our population data. Let's perform an outer merge so that we keep all our data, then we can check what's missing and remove it as appropriate." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
homicidesstateNAMEPOPESTIMATE2020
0654.0AlabamaAlabama5024803
155.0AlaskaAlaska732441
2525.0ArizonaArizona7177986
3371.0ArkansasArkansas3012232
42368.0CaliforniaCalifornia39499738
5336.0ColoradoColorado5784308
6152.0ConnecticutConnecticut3600260
786.0DelawareDelaware991886
81530.0FloridaFlorida21569932
91093.0GeorgiaGeorgia10725800
1046.0HawaiiHawaii1451911
1142.0IdahoIdaho1847772
121353.0IllinoisIllinois12785245
13620.0IndianaIndiana6785644
14106.0IowaIowa3188669
15195.0KansasKansas2935880
16404.0KentuckyKentucky4503958
17873.0LouisianaLouisiana4651203
1821.0MaineMaine1362280
19649.0MarylandMaryland6172679
20183.0MassachusettsMassachusetts7022220
21811.0MichiganMichigan10067664
22198.0MinnesotaMinnesota5707165
23576.0MississippiMississippi2956870
24803.0MissouriMissouri6154481
2565.0MontanaMontana1086193
2676.0NebraskaNebraska1961455
27217.0NevadaNevada3114071
2814.0New HampshireNew Hampshire1377848
29362.0New JerseyNew Jersey9279743
30216.0New MexicoNew Mexico2117566
31875.0New YorkNew York20154933
32884.0North CarolinaNorth Carolina10457177
3330.0North DakotaNorth Dakota778962
341004.0OhioOhio11790587
35342.0OklahomaOklahoma3962031
36157.0OregonOregon4241544
37994.0PennsylvaniaPennsylvania12989625
3829.0Rhode IslandRhode Island1096229
39622.0South CarolinaSouth Carolina5130729
4052.0South DakotaSouth Dakota887099
41753.0TennesseeTennessee6920119
422212.0TexasTexas29217653
4395.0UtahUtah3281684
4414.0VermontVermont642495
45531.0VirginiaVirginia8632044
46322.0WashingtonWashington7718785
47114.0West VirginiaWest Virginia1789798
48334.0WisconsinWisconsin5892323
4925.0WyomingWyoming577267
50NaNNaNUnited States331501080
51NaNNaNNortheast Region57525633
52NaNNaNMidwest Region68935174
53NaNNaNSouth Region126409007
54NaNNaNWest Region78631266
55NaNNaNDistrict of Columbia690093
56NaNNaNPuerto Rico3281538
\n", "
" ], "text/plain": [ " homicides state NAME POPESTIMATE2020\n", "0 654.0 Alabama Alabama 5024803\n", "1 55.0 Alaska Alaska 732441\n", "2 525.0 Arizona Arizona 7177986\n", "3 371.0 Arkansas Arkansas 3012232\n", "4 2368.0 California California 39499738\n", "5 336.0 Colorado Colorado 5784308\n", "6 152.0 Connecticut Connecticut 3600260\n", "7 86.0 Delaware Delaware 991886\n", "8 1530.0 Florida Florida 21569932\n", "9 1093.0 Georgia Georgia 10725800\n", "10 46.0 Hawaii Hawaii 1451911\n", "11 42.0 Idaho Idaho 1847772\n", "12 1353.0 Illinois Illinois 12785245\n", "13 620.0 Indiana Indiana 6785644\n", "14 106.0 Iowa Iowa 3188669\n", "15 195.0 Kansas Kansas 2935880\n", "16 404.0 Kentucky Kentucky 4503958\n", "17 873.0 Louisiana Louisiana 4651203\n", "18 21.0 Maine Maine 1362280\n", "19 649.0 Maryland Maryland 6172679\n", "20 183.0 Massachusetts Massachusetts 7022220\n", "21 811.0 Michigan Michigan 10067664\n", "22 198.0 Minnesota Minnesota 5707165\n", "23 576.0 Mississippi Mississippi 2956870\n", "24 803.0 Missouri Missouri 6154481\n", "25 65.0 Montana Montana 1086193\n", "26 76.0 Nebraska Nebraska 1961455\n", "27 217.0 Nevada Nevada 3114071\n", "28 14.0 New Hampshire New Hampshire 1377848\n", "29 362.0 New Jersey New Jersey 9279743\n", "30 216.0 New Mexico New Mexico 2117566\n", "31 875.0 New York New York 20154933\n", "32 884.0 North Carolina North Carolina 10457177\n", "33 30.0 North Dakota North Dakota 778962\n", "34 1004.0 Ohio Ohio 11790587\n", "35 342.0 Oklahoma Oklahoma 3962031\n", "36 157.0 Oregon Oregon 4241544\n", "37 994.0 Pennsylvania Pennsylvania 12989625\n", "38 29.0 Rhode Island Rhode Island 1096229\n", "39 622.0 South Carolina South Carolina 5130729\n", "40 52.0 South Dakota South Dakota 887099\n", "41 753.0 Tennessee Tennessee 6920119\n", "42 2212.0 Texas Texas 29217653\n", "43 95.0 Utah Utah 3281684\n", "44 14.0 Vermont Vermont 642495\n", "45 531.0 Virginia Virginia 8632044\n", "46 322.0 Washington Washington 7718785\n", "47 114.0 West Virginia West Virginia 1789798\n", "48 334.0 Wisconsin Wisconsin 5892323\n", "49 25.0 Wyoming Wyoming 577267\n", "50 NaN NaN United States 331501080\n", "51 NaN NaN Northeast Region 57525633\n", "52 NaN NaN Midwest Region 68935174\n", "53 NaN NaN South Region 126409007\n", "54 NaN NaN West Region 78631266\n", "55 NaN NaN District of Columbia 690093\n", "56 NaN NaN Puerto Rico 3281538" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "combined = pd.merge(\n", " homicides_with_states, population, how=\"outer\", left_on=\"state\", right_on=\"NAME\"\n", ")\n", "combined" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can see from the above that we have a number of `NaN` values. Let's look at rows with any `NaN` values:" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
homicidesstateNAMEPOPESTIMATE2020
50NaNNaNUnited States331501080
51NaNNaNNortheast Region57525633
52NaNNaNMidwest Region68935174
53NaNNaNSouth Region126409007
54NaNNaNWest Region78631266
55NaNNaNDistrict of Columbia690093
56NaNNaNPuerto Rico3281538
\n", "
" ], "text/plain": [ " homicides state NAME POPESTIMATE2020\n", "50 NaN NaN United States 331501080\n", "51 NaN NaN Northeast Region 57525633\n", "52 NaN NaN Midwest Region 68935174\n", "53 NaN NaN South Region 126409007\n", "54 NaN NaN West Region 78631266\n", "55 NaN NaN District of Columbia 690093\n", "56 NaN NaN Puerto Rico 3281538" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "combined[combined.isna().any(axis=1)]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's just break down the above statement for a moment for comprehension. `combined.isna()` returns a True value for each entry in the DataFrame with a `NaN` value. The `.any(axis=1)` looks across axis 1 and if any of the entries in that row (across all columns) has a True value, then that row is represented with a True. Let's look at the steps so it's clear:" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
homicidesstateNAMEPOPESTIMATE2020
0FalseFalseFalseFalse
1FalseFalseFalseFalse
2FalseFalseFalseFalse
3FalseFalseFalseFalse
4FalseFalseFalseFalse
5FalseFalseFalseFalse
6FalseFalseFalseFalse
7FalseFalseFalseFalse
8FalseFalseFalseFalse
9FalseFalseFalseFalse
10FalseFalseFalseFalse
11FalseFalseFalseFalse
12FalseFalseFalseFalse
13FalseFalseFalseFalse
14FalseFalseFalseFalse
15FalseFalseFalseFalse
16FalseFalseFalseFalse
17FalseFalseFalseFalse
18FalseFalseFalseFalse
19FalseFalseFalseFalse
20FalseFalseFalseFalse
21FalseFalseFalseFalse
22FalseFalseFalseFalse
23FalseFalseFalseFalse
24FalseFalseFalseFalse
25FalseFalseFalseFalse
26FalseFalseFalseFalse
27FalseFalseFalseFalse
28FalseFalseFalseFalse
29FalseFalseFalseFalse
30FalseFalseFalseFalse
31FalseFalseFalseFalse
32FalseFalseFalseFalse
33FalseFalseFalseFalse
34FalseFalseFalseFalse
35FalseFalseFalseFalse
36FalseFalseFalseFalse
37FalseFalseFalseFalse
38FalseFalseFalseFalse
39FalseFalseFalseFalse
40FalseFalseFalseFalse
41FalseFalseFalseFalse
42FalseFalseFalseFalse
43FalseFalseFalseFalse
44FalseFalseFalseFalse
45FalseFalseFalseFalse
46FalseFalseFalseFalse
47FalseFalseFalseFalse
48FalseFalseFalseFalse
49FalseFalseFalseFalse
50TrueTrueFalseFalse
51TrueTrueFalseFalse
52TrueTrueFalseFalse
53TrueTrueFalseFalse
54TrueTrueFalseFalse
55TrueTrueFalseFalse
56TrueTrueFalseFalse
\n", "
" ], "text/plain": [ " homicides state NAME POPESTIMATE2020\n", "0 False False False False\n", "1 False False False False\n", "2 False False False False\n", "3 False False False False\n", "4 False False False False\n", "5 False False False False\n", "6 False False False False\n", "7 False False False False\n", "8 False False False False\n", "9 False False False False\n", "10 False False False False\n", "11 False False False False\n", "12 False False False False\n", "13 False False False False\n", "14 False False False False\n", "15 False False False False\n", "16 False False False False\n", "17 False False False False\n", "18 False False False False\n", "19 False False False False\n", "20 False False False False\n", "21 False False False False\n", "22 False False False False\n", "23 False False False False\n", "24 False False False False\n", "25 False False False False\n", "26 False False False False\n", "27 False False False False\n", "28 False False False False\n", "29 False False False False\n", "30 False False False False\n", "31 False False False False\n", "32 False False False False\n", "33 False False False False\n", "34 False False False False\n", "35 False False False False\n", "36 False False False False\n", "37 False False False False\n", "38 False False False False\n", "39 False False False False\n", "40 False False False False\n", "41 False False False False\n", "42 False False False False\n", "43 False False False False\n", "44 False False False False\n", "45 False False False False\n", "46 False False False False\n", "47 False False False False\n", "48 False False False False\n", "49 False False False False\n", "50 True True False False\n", "51 True True False False\n", "52 True True False False\n", "53 True True False False\n", "54 True True False False\n", "55 True True False False\n", "56 True True False False" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "combined.isna()" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 False\n", "1 False\n", "2 False\n", "3 False\n", "4 False\n", "5 False\n", "6 False\n", "7 False\n", "8 False\n", "9 False\n", "10 False\n", "11 False\n", "12 False\n", "13 False\n", "14 False\n", "15 False\n", "16 False\n", "17 False\n", "18 False\n", "19 False\n", "20 False\n", "21 False\n", "22 False\n", "23 False\n", "24 False\n", "25 False\n", "26 False\n", "27 False\n", "28 False\n", "29 False\n", "30 False\n", "31 False\n", "32 False\n", "33 False\n", "34 False\n", "35 False\n", "36 False\n", "37 False\n", "38 False\n", "39 False\n", "40 False\n", "41 False\n", "42 False\n", "43 False\n", "44 False\n", "45 False\n", "46 False\n", "47 False\n", "48 False\n", "49 False\n", "50 True\n", "51 True\n", "52 True\n", "53 True\n", "54 True\n", "55 True\n", "56 True\n", "dtype: bool" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "combined.isna().any(axis=1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "So when we put it altogether, it only displaces those rows with `NaN` values:" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
homicidesstateNAMEPOPESTIMATE2020
50NaNNaNUnited States331501080
51NaNNaNNortheast Region57525633
52NaNNaNMidwest Region68935174
53NaNNaNSouth Region126409007
54NaNNaNWest Region78631266
55NaNNaNDistrict of Columbia690093
56NaNNaNPuerto Rico3281538
\n", "
" ], "text/plain": [ " homicides state NAME POPESTIMATE2020\n", "50 NaN NaN United States 331501080\n", "51 NaN NaN Northeast Region 57525633\n", "52 NaN NaN Midwest Region 68935174\n", "53 NaN NaN South Region 126409007\n", "54 NaN NaN West Region 78631266\n", "55 NaN NaN District of Columbia 690093\n", "56 NaN NaN Puerto Rico 3281538" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "combined[combined.isna().any(axis=1)]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "What we can see here is that we don't need any of these regions, as none are U.S. states, so let's drop them using the `dropna` method:" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
homicidesstateNAMEPOPESTIMATE2020
0654.0AlabamaAlabama5024803
155.0AlaskaAlaska732441
2525.0ArizonaArizona7177986
3371.0ArkansasArkansas3012232
42368.0CaliforniaCalifornia39499738
5336.0ColoradoColorado5784308
6152.0ConnecticutConnecticut3600260
786.0DelawareDelaware991886
81530.0FloridaFlorida21569932
91093.0GeorgiaGeorgia10725800
1046.0HawaiiHawaii1451911
1142.0IdahoIdaho1847772
121353.0IllinoisIllinois12785245
13620.0IndianaIndiana6785644
14106.0IowaIowa3188669
15195.0KansasKansas2935880
16404.0KentuckyKentucky4503958
17873.0LouisianaLouisiana4651203
1821.0MaineMaine1362280
19649.0MarylandMaryland6172679
20183.0MassachusettsMassachusetts7022220
21811.0MichiganMichigan10067664
22198.0MinnesotaMinnesota5707165
23576.0MississippiMississippi2956870
24803.0MissouriMissouri6154481
2565.0MontanaMontana1086193
2676.0NebraskaNebraska1961455
27217.0NevadaNevada3114071
2814.0New HampshireNew Hampshire1377848
29362.0New JerseyNew Jersey9279743
30216.0New MexicoNew Mexico2117566
31875.0New YorkNew York20154933
32884.0North CarolinaNorth Carolina10457177
3330.0North DakotaNorth Dakota778962
341004.0OhioOhio11790587
35342.0OklahomaOklahoma3962031
36157.0OregonOregon4241544
37994.0PennsylvaniaPennsylvania12989625
3829.0Rhode IslandRhode Island1096229
39622.0South CarolinaSouth Carolina5130729
4052.0South DakotaSouth Dakota887099
41753.0TennesseeTennessee6920119
422212.0TexasTexas29217653
4395.0UtahUtah3281684
4414.0VermontVermont642495
45531.0VirginiaVirginia8632044
46322.0WashingtonWashington7718785
47114.0West VirginiaWest Virginia1789798
48334.0WisconsinWisconsin5892323
4925.0WyomingWyoming577267
\n", "
" ], "text/plain": [ " homicides state NAME POPESTIMATE2020\n", "0 654.0 Alabama Alabama 5024803\n", "1 55.0 Alaska Alaska 732441\n", "2 525.0 Arizona Arizona 7177986\n", "3 371.0 Arkansas Arkansas 3012232\n", "4 2368.0 California California 39499738\n", "5 336.0 Colorado Colorado 5784308\n", "6 152.0 Connecticut Connecticut 3600260\n", "7 86.0 Delaware Delaware 991886\n", "8 1530.0 Florida Florida 21569932\n", "9 1093.0 Georgia Georgia 10725800\n", "10 46.0 Hawaii Hawaii 1451911\n", "11 42.0 Idaho Idaho 1847772\n", "12 1353.0 Illinois Illinois 12785245\n", "13 620.0 Indiana Indiana 6785644\n", "14 106.0 Iowa Iowa 3188669\n", "15 195.0 Kansas Kansas 2935880\n", "16 404.0 Kentucky Kentucky 4503958\n", "17 873.0 Louisiana Louisiana 4651203\n", "18 21.0 Maine Maine 1362280\n", "19 649.0 Maryland Maryland 6172679\n", "20 183.0 Massachusetts Massachusetts 7022220\n", "21 811.0 Michigan Michigan 10067664\n", "22 198.0 Minnesota Minnesota 5707165\n", "23 576.0 Mississippi Mississippi 2956870\n", "24 803.0 Missouri Missouri 6154481\n", "25 65.0 Montana Montana 1086193\n", "26 76.0 Nebraska Nebraska 1961455\n", "27 217.0 Nevada Nevada 3114071\n", "28 14.0 New Hampshire New Hampshire 1377848\n", "29 362.0 New Jersey New Jersey 9279743\n", "30 216.0 New Mexico New Mexico 2117566\n", "31 875.0 New York New York 20154933\n", "32 884.0 North Carolina North Carolina 10457177\n", "33 30.0 North Dakota North Dakota 778962\n", "34 1004.0 Ohio Ohio 11790587\n", "35 342.0 Oklahoma Oklahoma 3962031\n", "36 157.0 Oregon Oregon 4241544\n", "37 994.0 Pennsylvania Pennsylvania 12989625\n", "38 29.0 Rhode Island Rhode Island 1096229\n", "39 622.0 South Carolina South Carolina 5130729\n", "40 52.0 South Dakota South Dakota 887099\n", "41 753.0 Tennessee Tennessee 6920119\n", "42 2212.0 Texas Texas 29217653\n", "43 95.0 Utah Utah 3281684\n", "44 14.0 Vermont Vermont 642495\n", "45 531.0 Virginia Virginia 8632044\n", "46 322.0 Washington Washington 7718785\n", "47 114.0 West Virginia West Virginia 1789798\n", "48 334.0 Wisconsin Wisconsin 5892323\n", "49 25.0 Wyoming Wyoming 577267" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "combined = combined.dropna(axis=0)\n", "combined" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now that we have our data let's take one more step to clean the formatting of our data. Let's drop the extra column with state names and rename 'POPESTIMATE2020' to 'population':" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
homicidesstatepopulation
0654.0Alabama5024803
155.0Alaska732441
2525.0Arizona7177986
3371.0Arkansas3012232
42368.0California39499738
5336.0Colorado5784308
6152.0Connecticut3600260
786.0Delaware991886
81530.0Florida21569932
91093.0Georgia10725800
1046.0Hawaii1451911
1142.0Idaho1847772
121353.0Illinois12785245
13620.0Indiana6785644
14106.0Iowa3188669
15195.0Kansas2935880
16404.0Kentucky4503958
17873.0Louisiana4651203
1821.0Maine1362280
19649.0Maryland6172679
20183.0Massachusetts7022220
21811.0Michigan10067664
22198.0Minnesota5707165
23576.0Mississippi2956870
24803.0Missouri6154481
2565.0Montana1086193
2676.0Nebraska1961455
27217.0Nevada3114071
2814.0New Hampshire1377848
29362.0New Jersey9279743
30216.0New Mexico2117566
31875.0New York20154933
32884.0North Carolina10457177
3330.0North Dakota778962
341004.0Ohio11790587
35342.0Oklahoma3962031
36157.0Oregon4241544
37994.0Pennsylvania12989625
3829.0Rhode Island1096229
39622.0South Carolina5130729
4052.0South Dakota887099
41753.0Tennessee6920119
422212.0Texas29217653
4395.0Utah3281684
4414.0Vermont642495
45531.0Virginia8632044
46322.0Washington7718785
47114.0West Virginia1789798
48334.0Wisconsin5892323
4925.0Wyoming577267
\n", "
" ], "text/plain": [ " homicides state population\n", "0 654.0 Alabama 5024803\n", "1 55.0 Alaska 732441\n", "2 525.0 Arizona 7177986\n", "3 371.0 Arkansas 3012232\n", "4 2368.0 California 39499738\n", "5 336.0 Colorado 5784308\n", "6 152.0 Connecticut 3600260\n", "7 86.0 Delaware 991886\n", "8 1530.0 Florida 21569932\n", "9 1093.0 Georgia 10725800\n", "10 46.0 Hawaii 1451911\n", "11 42.0 Idaho 1847772\n", "12 1353.0 Illinois 12785245\n", "13 620.0 Indiana 6785644\n", "14 106.0 Iowa 3188669\n", "15 195.0 Kansas 2935880\n", "16 404.0 Kentucky 4503958\n", "17 873.0 Louisiana 4651203\n", "18 21.0 Maine 1362280\n", "19 649.0 Maryland 6172679\n", "20 183.0 Massachusetts 7022220\n", "21 811.0 Michigan 10067664\n", "22 198.0 Minnesota 5707165\n", "23 576.0 Mississippi 2956870\n", "24 803.0 Missouri 6154481\n", "25 65.0 Montana 1086193\n", "26 76.0 Nebraska 1961455\n", "27 217.0 Nevada 3114071\n", "28 14.0 New Hampshire 1377848\n", "29 362.0 New Jersey 9279743\n", "30 216.0 New Mexico 2117566\n", "31 875.0 New York 20154933\n", "32 884.0 North Carolina 10457177\n", "33 30.0 North Dakota 778962\n", "34 1004.0 Ohio 11790587\n", "35 342.0 Oklahoma 3962031\n", "36 157.0 Oregon 4241544\n", "37 994.0 Pennsylvania 12989625\n", "38 29.0 Rhode Island 1096229\n", "39 622.0 South Carolina 5130729\n", "40 52.0 South Dakota 887099\n", "41 753.0 Tennessee 6920119\n", "42 2212.0 Texas 29217653\n", "43 95.0 Utah 3281684\n", "44 14.0 Vermont 642495\n", "45 531.0 Virginia 8632044\n", "46 322.0 Washington 7718785\n", "47 114.0 West Virginia 1789798\n", "48 334.0 Wisconsin 5892323\n", "49 25.0 Wyoming 577267" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "combined = combined.drop(columns=[\"NAME\"])\n", "combined = combined.rename(columns={\"POPESTIMATE2020\": \"population\"})\n", "combined" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we're ready to conduct our analysis. Let's create a new column called 'homicide_rate' and it will be the number of homicides divided by the state population. Since this number is typically a small value, this is often reported as the number of homicides per 100,000 people, so we just multiply the result by 100,000." ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
homicidesstatepopulationhomicide_rate
0654.0Alabama502480313.015436
155.0Alaska7324417.509137
2525.0Arizona71779867.314029
3371.0Arkansas301223212.316448
42368.0California394997385.994976
5336.0Colorado57843085.808819
6152.0Connecticut36002604.221917
786.0Delaware9918868.670351
81530.0Florida215699327.093207
91093.0Georgia1072580010.190382
1046.0Hawaii14519113.168238
1142.0Idaho18477722.273008
121353.0Illinois1278524510.582511
13620.0Indiana67856449.136937
14106.0Iowa31886693.324271
15195.0Kansas29358806.641961
16404.0Kentucky45039588.969888
17873.0Louisiana465120318.769338
1821.0Maine13622801.541533
19649.0Maryland617267910.514073
20183.0Massachusetts70222202.606013
21811.0Michigan100676648.055493
22198.0Minnesota57071653.469323
23576.0Mississippi295687019.480058
24803.0Missouri615448113.047404
2565.0Montana10861935.984204
2676.0Nebraska19614553.874675
27217.0Nevada31140716.968370
2814.0New Hampshire13778481.016077
29362.0New Jersey92797433.900970
30216.0New Mexico211756610.200390
31875.0New York201549334.341369
32884.0North Carolina104571778.453524
3330.0North Dakota7789623.851279
341004.0Ohio117905878.515267
35342.0Oklahoma39620318.631936
36157.0Oregon42415443.701482
37994.0Pennsylvania129896257.652261
3829.0Rhode Island10962292.645433
39622.0South Carolina513072912.123034
4052.0South Dakota8870995.861803
41753.0Tennessee692011910.881316
422212.0Texas292176537.570766
4395.0Utah32816842.894855
4414.0Vermont6424952.179005
45531.0Virginia86320446.151498
46322.0Washington77187854.171641
47114.0West Virginia17897986.369434
48334.0Wisconsin58923235.668393
4925.0Wyoming5772674.330752
\n", "
" ], "text/plain": [ " homicides state population homicide_rate\n", "0 654.0 Alabama 5024803 13.015436\n", "1 55.0 Alaska 732441 7.509137\n", "2 525.0 Arizona 7177986 7.314029\n", "3 371.0 Arkansas 3012232 12.316448\n", "4 2368.0 California 39499738 5.994976\n", "5 336.0 Colorado 5784308 5.808819\n", "6 152.0 Connecticut 3600260 4.221917\n", "7 86.0 Delaware 991886 8.670351\n", "8 1530.0 Florida 21569932 7.093207\n", "9 1093.0 Georgia 10725800 10.190382\n", "10 46.0 Hawaii 1451911 3.168238\n", "11 42.0 Idaho 1847772 2.273008\n", "12 1353.0 Illinois 12785245 10.582511\n", "13 620.0 Indiana 6785644 9.136937\n", "14 106.0 Iowa 3188669 3.324271\n", "15 195.0 Kansas 2935880 6.641961\n", "16 404.0 Kentucky 4503958 8.969888\n", "17 873.0 Louisiana 4651203 18.769338\n", "18 21.0 Maine 1362280 1.541533\n", "19 649.0 Maryland 6172679 10.514073\n", "20 183.0 Massachusetts 7022220 2.606013\n", "21 811.0 Michigan 10067664 8.055493\n", "22 198.0 Minnesota 5707165 3.469323\n", "23 576.0 Mississippi 2956870 19.480058\n", "24 803.0 Missouri 6154481 13.047404\n", "25 65.0 Montana 1086193 5.984204\n", "26 76.0 Nebraska 1961455 3.874675\n", "27 217.0 Nevada 3114071 6.968370\n", "28 14.0 New Hampshire 1377848 1.016077\n", "29 362.0 New Jersey 9279743 3.900970\n", "30 216.0 New Mexico 2117566 10.200390\n", "31 875.0 New York 20154933 4.341369\n", "32 884.0 North Carolina 10457177 8.453524\n", "33 30.0 North Dakota 778962 3.851279\n", "34 1004.0 Ohio 11790587 8.515267\n", "35 342.0 Oklahoma 3962031 8.631936\n", "36 157.0 Oregon 4241544 3.701482\n", "37 994.0 Pennsylvania 12989625 7.652261\n", "38 29.0 Rhode Island 1096229 2.645433\n", "39 622.0 South Carolina 5130729 12.123034\n", "40 52.0 South Dakota 887099 5.861803\n", "41 753.0 Tennessee 6920119 10.881316\n", "42 2212.0 Texas 29217653 7.570766\n", "43 95.0 Utah 3281684 2.894855\n", "44 14.0 Vermont 642495 2.179005\n", "45 531.0 Virginia 8632044 6.151498\n", "46 322.0 Washington 7718785 4.171641\n", "47 114.0 West Virginia 1789798 6.369434\n", "48 334.0 Wisconsin 5892323 5.668393\n", "49 25.0 Wyoming 577267 4.330752" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "combined[\"homicide_rate\"] = combined[\"homicides\"] / combined[\"population\"] * 100000\n", "combined" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Lastly, let's sort the data by 'homicide_rate' to see what the states are with the highest and lowest rates:" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
homicidesstatepopulationhomicide_rate
2814.0New Hampshire13778481.016077
1821.0Maine13622801.541533
4414.0Vermont6424952.179005
1142.0Idaho18477722.273008
20183.0Massachusetts70222202.606013
3829.0Rhode Island10962292.645433
4395.0Utah32816842.894855
1046.0Hawaii14519113.168238
14106.0Iowa31886693.324271
22198.0Minnesota57071653.469323
36157.0Oregon42415443.701482
3330.0North Dakota7789623.851279
2676.0Nebraska19614553.874675
29362.0New Jersey92797433.900970
46322.0Washington77187854.171641
6152.0Connecticut36002604.221917
4925.0Wyoming5772674.330752
31875.0New York201549334.341369
48334.0Wisconsin58923235.668393
5336.0Colorado57843085.808819
4052.0South Dakota8870995.861803
2565.0Montana10861935.984204
42368.0California394997385.994976
45531.0Virginia86320446.151498
47114.0West Virginia17897986.369434
15195.0Kansas29358806.641961
27217.0Nevada31140716.968370
81530.0Florida215699327.093207
2525.0Arizona71779867.314029
155.0Alaska7324417.509137
422212.0Texas292176537.570766
37994.0Pennsylvania129896257.652261
21811.0Michigan100676648.055493
32884.0North Carolina104571778.453524
341004.0Ohio117905878.515267
35342.0Oklahoma39620318.631936
786.0Delaware9918868.670351
16404.0Kentucky45039588.969888
13620.0Indiana67856449.136937
91093.0Georgia1072580010.190382
30216.0New Mexico211756610.200390
19649.0Maryland617267910.514073
121353.0Illinois1278524510.582511
41753.0Tennessee692011910.881316
39622.0South Carolina513072912.123034
3371.0Arkansas301223212.316448
0654.0Alabama502480313.015436
24803.0Missouri615448113.047404
17873.0Louisiana465120318.769338
23576.0Mississippi295687019.480058
\n", "
" ], "text/plain": [ " homicides state population homicide_rate\n", "28 14.0 New Hampshire 1377848 1.016077\n", "18 21.0 Maine 1362280 1.541533\n", "44 14.0 Vermont 642495 2.179005\n", "11 42.0 Idaho 1847772 2.273008\n", "20 183.0 Massachusetts 7022220 2.606013\n", "38 29.0 Rhode Island 1096229 2.645433\n", "43 95.0 Utah 3281684 2.894855\n", "10 46.0 Hawaii 1451911 3.168238\n", "14 106.0 Iowa 3188669 3.324271\n", "22 198.0 Minnesota 5707165 3.469323\n", "36 157.0 Oregon 4241544 3.701482\n", "33 30.0 North Dakota 778962 3.851279\n", "26 76.0 Nebraska 1961455 3.874675\n", "29 362.0 New Jersey 9279743 3.900970\n", "46 322.0 Washington 7718785 4.171641\n", "6 152.0 Connecticut 3600260 4.221917\n", "49 25.0 Wyoming 577267 4.330752\n", "31 875.0 New York 20154933 4.341369\n", "48 334.0 Wisconsin 5892323 5.668393\n", "5 336.0 Colorado 5784308 5.808819\n", "40 52.0 South Dakota 887099 5.861803\n", "25 65.0 Montana 1086193 5.984204\n", "4 2368.0 California 39499738 5.994976\n", "45 531.0 Virginia 8632044 6.151498\n", "47 114.0 West Virginia 1789798 6.369434\n", "15 195.0 Kansas 2935880 6.641961\n", "27 217.0 Nevada 3114071 6.968370\n", "8 1530.0 Florida 21569932 7.093207\n", "2 525.0 Arizona 7177986 7.314029\n", "1 55.0 Alaska 732441 7.509137\n", "42 2212.0 Texas 29217653 7.570766\n", "37 994.0 Pennsylvania 12989625 7.652261\n", "21 811.0 Michigan 10067664 8.055493\n", "32 884.0 North Carolina 10457177 8.453524\n", "34 1004.0 Ohio 11790587 8.515267\n", "35 342.0 Oklahoma 3962031 8.631936\n", "7 86.0 Delaware 991886 8.670351\n", "16 404.0 Kentucky 4503958 8.969888\n", "13 620.0 Indiana 6785644 9.136937\n", "9 1093.0 Georgia 10725800 10.190382\n", "30 216.0 New Mexico 2117566 10.200390\n", "19 649.0 Maryland 6172679 10.514073\n", "12 1353.0 Illinois 12785245 10.582511\n", "41 753.0 Tennessee 6920119 10.881316\n", "39 622.0 South Carolina 5130729 12.123034\n", "3 371.0 Arkansas 3012232 12.316448\n", "0 654.0 Alabama 5024803 13.015436\n", "24 803.0 Missouri 6154481 13.047404\n", "17 873.0 Louisiana 4651203 18.769338\n", "23 576.0 Mississippi 2956870 19.480058" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "combined = combined.sort_values(by=\"homicide_rate\")\n", "combined" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can see that the states of New Hampshire, Maine, Vermont, Idaho, and Massachusetts, each have homicide rates between and 1 and 2.6 people per 100,000. Arkansas, Alabama, Missouri, Louisiana, and Mississippi have far higher rates ranging from 12.3 to 19.5 people per 100,000. The homicide rate is nearly 20 times higher in Mississippi than it is in New Hampshire.\n", "\n", "All of the skills and tools shown here scale up to larger datasets as well. As we can see, data preparation was a significant part of the effort and being able to programmatically complete the process was essential. Data preparation requires careful design to take care of potential imperfections, omissions, or mismatches in the data. Once taken care of, the subsequent analysis can proceed for us to derive insights from our data." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Summary\n", "\n", "In this section, we saw that merging is a key tool for combining datasets and is able to accommodate many of the common types of merges that we would want to execute. Merging data allows us to integrate the information from two DataFrames logically and coherently to prepare it for further analysis and querying, but care must be taken in structuring the merge so that we don't lose data or introduce unwanted `NaN` values. Next, you'll get some practice merging data. Once we have our data combined, in an upcoming lesson, we'll explore how to group our data by its columns." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3.9.13 ('ds')", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.7" }, "orig_nbformat": 4, "vscode": { "interpreter": { "hash": "a8f7447a67778e188a9289d5afd8f46b343fa90d5cd10b8a7763899ed796f96a" } } }, "nbformat": 4, "nbformat_minor": 2 }