{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Practical Example\n",
"\n",
"Let's explore a practical example: what if we wanted to know how common homicide was in each state in the United States and identify those places where per capita homicide lowest and highest. To start with, this means we'll need some data, so we'll use data from the [U.S. Centers for Disease Control and Prevention](https://www.cdc.gov/nchs/pressroom/sosmap/homicide_mortality/homicide.htm) for homicide totals by state and data from the [U.S. Census Bureau](https://data.census.gov/cedsci/table?tid=PEPPOP2021.NST_EST2021_POP&hidePreview=false) to get population in 2020. Let's read those files into dataframes and see what we've got!\n",
"\n",
"*Note: we've removed some extraneous columns to make the example clearer*"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" NAME | \n",
" POPESTIMATE2020 | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" United States | \n",
" 331501080 | \n",
"
\n",
" \n",
" 1 | \n",
" Northeast Region | \n",
" 57525633 | \n",
"
\n",
" \n",
" 2 | \n",
" Midwest Region | \n",
" 68935174 | \n",
"
\n",
" \n",
" 3 | \n",
" South Region | \n",
" 126409007 | \n",
"
\n",
" \n",
" 4 | \n",
" West Region | \n",
" 78631266 | \n",
"
\n",
" \n",
" 5 | \n",
" Alabama | \n",
" 5024803 | \n",
"
\n",
" \n",
" 6 | \n",
" Alaska | \n",
" 732441 | \n",
"
\n",
" \n",
" 7 | \n",
" Arizona | \n",
" 7177986 | \n",
"
\n",
" \n",
" 8 | \n",
" Arkansas | \n",
" 3012232 | \n",
"
\n",
" \n",
" 9 | \n",
" California | \n",
" 39499738 | \n",
"
\n",
" \n",
" 10 | \n",
" Colorado | \n",
" 5784308 | \n",
"
\n",
" \n",
" 11 | \n",
" Connecticut | \n",
" 3600260 | \n",
"
\n",
" \n",
" 12 | \n",
" Delaware | \n",
" 991886 | \n",
"
\n",
" \n",
" 13 | \n",
" District of Columbia | \n",
" 690093 | \n",
"
\n",
" \n",
" 14 | \n",
" Florida | \n",
" 21569932 | \n",
"
\n",
" \n",
" 15 | \n",
" Georgia | \n",
" 10725800 | \n",
"
\n",
" \n",
" 16 | \n",
" Hawaii | \n",
" 1451911 | \n",
"
\n",
" \n",
" 17 | \n",
" Idaho | \n",
" 1847772 | \n",
"
\n",
" \n",
" 18 | \n",
" Illinois | \n",
" 12785245 | \n",
"
\n",
" \n",
" 19 | \n",
" Indiana | \n",
" 6785644 | \n",
"
\n",
" \n",
" 20 | \n",
" Iowa | \n",
" 3188669 | \n",
"
\n",
" \n",
" 21 | \n",
" Kansas | \n",
" 2935880 | \n",
"
\n",
" \n",
" 22 | \n",
" Kentucky | \n",
" 4503958 | \n",
"
\n",
" \n",
" 23 | \n",
" Louisiana | \n",
" 4651203 | \n",
"
\n",
" \n",
" 24 | \n",
" Maine | \n",
" 1362280 | \n",
"
\n",
" \n",
" 25 | \n",
" Maryland | \n",
" 6172679 | \n",
"
\n",
" \n",
" 26 | \n",
" Massachusetts | \n",
" 7022220 | \n",
"
\n",
" \n",
" 27 | \n",
" Michigan | \n",
" 10067664 | \n",
"
\n",
" \n",
" 28 | \n",
" Minnesota | \n",
" 5707165 | \n",
"
\n",
" \n",
" 29 | \n",
" Mississippi | \n",
" 2956870 | \n",
"
\n",
" \n",
" 30 | \n",
" Missouri | \n",
" 6154481 | \n",
"
\n",
" \n",
" 31 | \n",
" Montana | \n",
" 1086193 | \n",
"
\n",
" \n",
" 32 | \n",
" Nebraska | \n",
" 1961455 | \n",
"
\n",
" \n",
" 33 | \n",
" Nevada | \n",
" 3114071 | \n",
"
\n",
" \n",
" 34 | \n",
" New Hampshire | \n",
" 1377848 | \n",
"
\n",
" \n",
" 35 | \n",
" New Jersey | \n",
" 9279743 | \n",
"
\n",
" \n",
" 36 | \n",
" New Mexico | \n",
" 2117566 | \n",
"
\n",
" \n",
" 37 | \n",
" New York | \n",
" 20154933 | \n",
"
\n",
" \n",
" 38 | \n",
" North Carolina | \n",
" 10457177 | \n",
"
\n",
" \n",
" 39 | \n",
" North Dakota | \n",
" 778962 | \n",
"
\n",
" \n",
" 40 | \n",
" Ohio | \n",
" 11790587 | \n",
"
\n",
" \n",
" 41 | \n",
" Oklahoma | \n",
" 3962031 | \n",
"
\n",
" \n",
" 42 | \n",
" Oregon | \n",
" 4241544 | \n",
"
\n",
" \n",
" 43 | \n",
" Pennsylvania | \n",
" 12989625 | \n",
"
\n",
" \n",
" 44 | \n",
" Rhode Island | \n",
" 1096229 | \n",
"
\n",
" \n",
" 45 | \n",
" South Carolina | \n",
" 5130729 | \n",
"
\n",
" \n",
" 46 | \n",
" South Dakota | \n",
" 887099 | \n",
"
\n",
" \n",
" 47 | \n",
" Tennessee | \n",
" 6920119 | \n",
"
\n",
" \n",
" 48 | \n",
" Texas | \n",
" 29217653 | \n",
"
\n",
" \n",
" 49 | \n",
" Utah | \n",
" 3281684 | \n",
"
\n",
" \n",
" 50 | \n",
" Vermont | \n",
" 642495 | \n",
"
\n",
" \n",
" 51 | \n",
" Virginia | \n",
" 8632044 | \n",
"
\n",
" \n",
" 52 | \n",
" Washington | \n",
" 7718785 | \n",
"
\n",
" \n",
" 53 | \n",
" West Virginia | \n",
" 1789798 | \n",
"
\n",
" \n",
" 54 | \n",
" Wisconsin | \n",
" 5892323 | \n",
"
\n",
" \n",
" 55 | \n",
" Wyoming | \n",
" 577267 | \n",
"
\n",
" \n",
" 56 | \n",
" Puerto Rico | \n",
" 3281538 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" NAME POPESTIMATE2020\n",
"0 United States 331501080\n",
"1 Northeast Region 57525633\n",
"2 Midwest Region 68935174\n",
"3 South Region 126409007\n",
"4 West Region 78631266\n",
"5 Alabama 5024803\n",
"6 Alaska 732441\n",
"7 Arizona 7177986\n",
"8 Arkansas 3012232\n",
"9 California 39499738\n",
"10 Colorado 5784308\n",
"11 Connecticut 3600260\n",
"12 Delaware 991886\n",
"13 District of Columbia 690093\n",
"14 Florida 21569932\n",
"15 Georgia 10725800\n",
"16 Hawaii 1451911\n",
"17 Idaho 1847772\n",
"18 Illinois 12785245\n",
"19 Indiana 6785644\n",
"20 Iowa 3188669\n",
"21 Kansas 2935880\n",
"22 Kentucky 4503958\n",
"23 Louisiana 4651203\n",
"24 Maine 1362280\n",
"25 Maryland 6172679\n",
"26 Massachusetts 7022220\n",
"27 Michigan 10067664\n",
"28 Minnesota 5707165\n",
"29 Mississippi 2956870\n",
"30 Missouri 6154481\n",
"31 Montana 1086193\n",
"32 Nebraska 1961455\n",
"33 Nevada 3114071\n",
"34 New Hampshire 1377848\n",
"35 New Jersey 9279743\n",
"36 New Mexico 2117566\n",
"37 New York 20154933\n",
"38 North Carolina 10457177\n",
"39 North Dakota 778962\n",
"40 Ohio 11790587\n",
"41 Oklahoma 3962031\n",
"42 Oregon 4241544\n",
"43 Pennsylvania 12989625\n",
"44 Rhode Island 1096229\n",
"45 South Carolina 5130729\n",
"46 South Dakota 887099\n",
"47 Tennessee 6920119\n",
"48 Texas 29217653\n",
"49 Utah 3281684\n",
"50 Vermont 642495\n",
"51 Virginia 8632044\n",
"52 Washington 7718785\n",
"53 West Virginia 1789798\n",
"54 Wisconsin 5892323\n",
"55 Wyoming 577267\n",
"56 Puerto Rico 3281538"
]
},
"execution_count": 1,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import pandas as pd\n",
"\n",
"pd.set_option(\"mode.copy_on_write\", True)\n",
"\n",
"population = pd.read_csv(\"data/population.csv\")\n",
"population"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" STATE | \n",
" DEATHS | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" AL | \n",
" 654 | \n",
"
\n",
" \n",
" 1 | \n",
" AK | \n",
" 55 | \n",
"
\n",
" \n",
" 2 | \n",
" AZ | \n",
" 525 | \n",
"
\n",
" \n",
" 3 | \n",
" AR | \n",
" 371 | \n",
"
\n",
" \n",
" 4 | \n",
" CA | \n",
" 2368 | \n",
"
\n",
" \n",
" 5 | \n",
" CO | \n",
" 336 | \n",
"
\n",
" \n",
" 6 | \n",
" CT | \n",
" 152 | \n",
"
\n",
" \n",
" 7 | \n",
" DE | \n",
" 86 | \n",
"
\n",
" \n",
" 8 | \n",
" FL | \n",
" 1530 | \n",
"
\n",
" \n",
" 9 | \n",
" GA | \n",
" 1093 | \n",
"
\n",
" \n",
" 10 | \n",
" HI | \n",
" 46 | \n",
"
\n",
" \n",
" 11 | \n",
" ID | \n",
" 42 | \n",
"
\n",
" \n",
" 12 | \n",
" IL | \n",
" 1353 | \n",
"
\n",
" \n",
" 13 | \n",
" IN | \n",
" 620 | \n",
"
\n",
" \n",
" 14 | \n",
" IA | \n",
" 106 | \n",
"
\n",
" \n",
" 15 | \n",
" KS | \n",
" 195 | \n",
"
\n",
" \n",
" 16 | \n",
" KY | \n",
" 404 | \n",
"
\n",
" \n",
" 17 | \n",
" LA | \n",
" 873 | \n",
"
\n",
" \n",
" 18 | \n",
" ME | \n",
" 21 | \n",
"
\n",
" \n",
" 19 | \n",
" MD | \n",
" 649 | \n",
"
\n",
" \n",
" 20 | \n",
" MA | \n",
" 183 | \n",
"
\n",
" \n",
" 21 | \n",
" MI | \n",
" 811 | \n",
"
\n",
" \n",
" 22 | \n",
" MN | \n",
" 198 | \n",
"
\n",
" \n",
" 23 | \n",
" MS | \n",
" 576 | \n",
"
\n",
" \n",
" 24 | \n",
" MO | \n",
" 803 | \n",
"
\n",
" \n",
" 25 | \n",
" MT | \n",
" 65 | \n",
"
\n",
" \n",
" 26 | \n",
" NE | \n",
" 76 | \n",
"
\n",
" \n",
" 27 | \n",
" NV | \n",
" 217 | \n",
"
\n",
" \n",
" 28 | \n",
" NH | \n",
" 14 | \n",
"
\n",
" \n",
" 29 | \n",
" NJ | \n",
" 362 | \n",
"
\n",
" \n",
" 30 | \n",
" NM | \n",
" 216 | \n",
"
\n",
" \n",
" 31 | \n",
" NY | \n",
" 875 | \n",
"
\n",
" \n",
" 32 | \n",
" NC | \n",
" 884 | \n",
"
\n",
" \n",
" 33 | \n",
" ND | \n",
" 30 | \n",
"
\n",
" \n",
" 34 | \n",
" OH | \n",
" 1004 | \n",
"
\n",
" \n",
" 35 | \n",
" OK | \n",
" 342 | \n",
"
\n",
" \n",
" 36 | \n",
" OR | \n",
" 157 | \n",
"
\n",
" \n",
" 37 | \n",
" PA | \n",
" 994 | \n",
"
\n",
" \n",
" 38 | \n",
" RI | \n",
" 29 | \n",
"
\n",
" \n",
" 39 | \n",
" SC | \n",
" 622 | \n",
"
\n",
" \n",
" 40 | \n",
" SD | \n",
" 52 | \n",
"
\n",
" \n",
" 41 | \n",
" TN | \n",
" 753 | \n",
"
\n",
" \n",
" 42 | \n",
" TX | \n",
" 2212 | \n",
"
\n",
" \n",
" 43 | \n",
" UT | \n",
" 95 | \n",
"
\n",
" \n",
" 44 | \n",
" VT | \n",
" 14 | \n",
"
\n",
" \n",
" 45 | \n",
" VA | \n",
" 531 | \n",
"
\n",
" \n",
" 46 | \n",
" WA | \n",
" 322 | \n",
"
\n",
" \n",
" 47 | \n",
" WV | \n",
" 114 | \n",
"
\n",
" \n",
" 48 | \n",
" WI | \n",
" 334 | \n",
"
\n",
" \n",
" 49 | \n",
" WY | \n",
" 25 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" STATE DEATHS\n",
"0 AL 654\n",
"1 AK 55\n",
"2 AZ 525\n",
"3 AR 371\n",
"4 CA 2368\n",
"5 CO 336\n",
"6 CT 152\n",
"7 DE 86\n",
"8 FL 1530\n",
"9 GA 1093\n",
"10 HI 46\n",
"11 ID 42\n",
"12 IL 1353\n",
"13 IN 620\n",
"14 IA 106\n",
"15 KS 195\n",
"16 KY 404\n",
"17 LA 873\n",
"18 ME 21\n",
"19 MD 649\n",
"20 MA 183\n",
"21 MI 811\n",
"22 MN 198\n",
"23 MS 576\n",
"24 MO 803\n",
"25 MT 65\n",
"26 NE 76\n",
"27 NV 217\n",
"28 NH 14\n",
"29 NJ 362\n",
"30 NM 216\n",
"31 NY 875\n",
"32 NC 884\n",
"33 ND 30\n",
"34 OH 1004\n",
"35 OK 342\n",
"36 OR 157\n",
"37 PA 994\n",
"38 RI 29\n",
"39 SC 622\n",
"40 SD 52\n",
"41 TN 753\n",
"42 TX 2212\n",
"43 UT 95\n",
"44 VT 14\n",
"45 VA 531\n",
"46 WA 322\n",
"47 WV 114\n",
"48 WI 334\n",
"49 WY 25"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"homicides = pd.read_csv(\"data/homicides.csv\")\n",
"homicides"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"There are two items to note here. The first is that the population data has the full state name listed, while the homicide data only has state abbreviations. The second is that while the homicide data has 50 states, the population data has 57 since there are additional regions listed (e.g. \"United States\", \"Northeast Region\", etc.).\n",
"\n",
"What we'll want to do, is to add in the state names for each state in the homicide data, then merge the two datasets into one. Once we have that combined dataset, we can calculate the per capita homicide rate in each state and sort our data to determine the highest/lowest per capita.\n",
"\n",
"To add in the state names for each state, let's load in a dataset that contains the 50 states and their abbreviations:"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" State | \n",
" Code | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" Alabama | \n",
" AL | \n",
"
\n",
" \n",
" 1 | \n",
" Alaska | \n",
" AK | \n",
"
\n",
" \n",
" 2 | \n",
" Arizona | \n",
" AZ | \n",
"
\n",
" \n",
" 3 | \n",
" Arkansas | \n",
" AR | \n",
"
\n",
" \n",
" 4 | \n",
" California | \n",
" CA | \n",
"
\n",
" \n",
" 5 | \n",
" Colorado | \n",
" CO | \n",
"
\n",
" \n",
" 6 | \n",
" Connecticut | \n",
" CT | \n",
"
\n",
" \n",
" 7 | \n",
" Delaware | \n",
" DE | \n",
"
\n",
" \n",
" 8 | \n",
" District of Columbia | \n",
" DC | \n",
"
\n",
" \n",
" 9 | \n",
" Florida | \n",
" FL | \n",
"
\n",
" \n",
" 10 | \n",
" Georgia | \n",
" GA | \n",
"
\n",
" \n",
" 11 | \n",
" Hawaii | \n",
" HI | \n",
"
\n",
" \n",
" 12 | \n",
" Idaho | \n",
" ID | \n",
"
\n",
" \n",
" 13 | \n",
" Illinois | \n",
" IL | \n",
"
\n",
" \n",
" 14 | \n",
" Indiana | \n",
" IN | \n",
"
\n",
" \n",
" 15 | \n",
" Iowa | \n",
" IA | \n",
"
\n",
" \n",
" 16 | \n",
" Kansas | \n",
" KS | \n",
"
\n",
" \n",
" 17 | \n",
" Kentucky | \n",
" KY | \n",
"
\n",
" \n",
" 18 | \n",
" Louisiana | \n",
" LA | \n",
"
\n",
" \n",
" 19 | \n",
" Maine | \n",
" ME | \n",
"
\n",
" \n",
" 20 | \n",
" Maryland | \n",
" MD | \n",
"
\n",
" \n",
" 21 | \n",
" Massachusetts | \n",
" MA | \n",
"
\n",
" \n",
" 22 | \n",
" Michigan | \n",
" MI | \n",
"
\n",
" \n",
" 23 | \n",
" Minnesota | \n",
" MN | \n",
"
\n",
" \n",
" 24 | \n",
" Mississippi | \n",
" MS | \n",
"
\n",
" \n",
" 25 | \n",
" Missouri | \n",
" MO | \n",
"
\n",
" \n",
" 26 | \n",
" Montana | \n",
" MT | \n",
"
\n",
" \n",
" 27 | \n",
" Nebraska | \n",
" NE | \n",
"
\n",
" \n",
" 28 | \n",
" Nevada | \n",
" NV | \n",
"
\n",
" \n",
" 29 | \n",
" New Hampshire | \n",
" NH | \n",
"
\n",
" \n",
" 30 | \n",
" New Jersey | \n",
" NJ | \n",
"
\n",
" \n",
" 31 | \n",
" New Mexico | \n",
" NM | \n",
"
\n",
" \n",
" 32 | \n",
" New York | \n",
" NY | \n",
"
\n",
" \n",
" 33 | \n",
" North Carolina | \n",
" NC | \n",
"
\n",
" \n",
" 34 | \n",
" North Dakota | \n",
" ND | \n",
"
\n",
" \n",
" 35 | \n",
" Ohio | \n",
" OH | \n",
"
\n",
" \n",
" 36 | \n",
" Oklahoma | \n",
" OK | \n",
"
\n",
" \n",
" 37 | \n",
" Oregon | \n",
" OR | \n",
"
\n",
" \n",
" 38 | \n",
" Pennsylvania | \n",
" PA | \n",
"
\n",
" \n",
" 39 | \n",
" Rhode Island | \n",
" RI | \n",
"
\n",
" \n",
" 40 | \n",
" South Carolina | \n",
" SC | \n",
"
\n",
" \n",
" 41 | \n",
" South Dakota | \n",
" SD | \n",
"
\n",
" \n",
" 42 | \n",
" Tennessee | \n",
" TN | \n",
"
\n",
" \n",
" 43 | \n",
" Texas | \n",
" TX | \n",
"
\n",
" \n",
" 44 | \n",
" Utah | \n",
" UT | \n",
"
\n",
" \n",
" 45 | \n",
" Vermont | \n",
" VT | \n",
"
\n",
" \n",
" 46 | \n",
" Virginia | \n",
" VA | \n",
"
\n",
" \n",
" 47 | \n",
" Washington | \n",
" WA | \n",
"
\n",
" \n",
" 48 | \n",
" West Virginia | \n",
" WV | \n",
"
\n",
" \n",
" 49 | \n",
" Wisconsin | \n",
" WI | \n",
"
\n",
" \n",
" 50 | \n",
" Wyoming | \n",
" WY | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" State Code\n",
"0 Alabama AL\n",
"1 Alaska AK\n",
"2 Arizona AZ\n",
"3 Arkansas AR\n",
"4 California CA\n",
"5 Colorado CO\n",
"6 Connecticut CT\n",
"7 Delaware DE\n",
"8 District of Columbia DC\n",
"9 Florida FL\n",
"10 Georgia GA\n",
"11 Hawaii HI\n",
"12 Idaho ID\n",
"13 Illinois IL\n",
"14 Indiana IN\n",
"15 Iowa IA\n",
"16 Kansas KS\n",
"17 Kentucky KY\n",
"18 Louisiana LA\n",
"19 Maine ME\n",
"20 Maryland MD\n",
"21 Massachusetts MA\n",
"22 Michigan MI\n",
"23 Minnesota MN\n",
"24 Mississippi MS\n",
"25 Missouri MO\n",
"26 Montana MT\n",
"27 Nebraska NE\n",
"28 Nevada NV\n",
"29 New Hampshire NH\n",
"30 New Jersey NJ\n",
"31 New Mexico NM\n",
"32 New York NY\n",
"33 North Carolina NC\n",
"34 North Dakota ND\n",
"35 Ohio OH\n",
"36 Oklahoma OK\n",
"37 Oregon OR\n",
"38 Pennsylvania PA\n",
"39 Rhode Island RI\n",
"40 South Carolina SC\n",
"41 South Dakota SD\n",
"42 Tennessee TN\n",
"43 Texas TX\n",
"44 Utah UT\n",
"45 Vermont VT\n",
"46 Virginia VA\n",
"47 Washington WA\n",
"48 West Virginia WV\n",
"49 Wisconsin WI\n",
"50 Wyoming WY"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"names = pd.read_csv(\"data/state_codes.csv\")\n",
"names"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We see another issue here, though: there are 51 states listed with abbreviations (remember, Python starts counting at 0) since the District of Columbia is included while there are 50 states listed in the `homicides` DataFrame. Let's left-merge the data so we only keep the 50 states for which we have data. Since the column in `homicides` that has the labels is labeled 'STATE' and the column in `names` with the state labels is called 'Code', we'll use the 'left_on' and 'right_on' parameters to specify those columns:"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" STATE | \n",
" DEATHS | \n",
" State | \n",
" Code | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" AL | \n",
" 654 | \n",
" Alabama | \n",
" AL | \n",
"
\n",
" \n",
" 1 | \n",
" AK | \n",
" 55 | \n",
" Alaska | \n",
" AK | \n",
"
\n",
" \n",
" 2 | \n",
" AZ | \n",
" 525 | \n",
" Arizona | \n",
" AZ | \n",
"
\n",
" \n",
" 3 | \n",
" AR | \n",
" 371 | \n",
" Arkansas | \n",
" AR | \n",
"
\n",
" \n",
" 4 | \n",
" CA | \n",
" 2368 | \n",
" California | \n",
" CA | \n",
"
\n",
" \n",
" 5 | \n",
" CO | \n",
" 336 | \n",
" Colorado | \n",
" CO | \n",
"
\n",
" \n",
" 6 | \n",
" CT | \n",
" 152 | \n",
" Connecticut | \n",
" CT | \n",
"
\n",
" \n",
" 7 | \n",
" DE | \n",
" 86 | \n",
" Delaware | \n",
" DE | \n",
"
\n",
" \n",
" 8 | \n",
" FL | \n",
" 1530 | \n",
" Florida | \n",
" FL | \n",
"
\n",
" \n",
" 9 | \n",
" GA | \n",
" 1093 | \n",
" Georgia | \n",
" GA | \n",
"
\n",
" \n",
" 10 | \n",
" HI | \n",
" 46 | \n",
" Hawaii | \n",
" HI | \n",
"
\n",
" \n",
" 11 | \n",
" ID | \n",
" 42 | \n",
" Idaho | \n",
" ID | \n",
"
\n",
" \n",
" 12 | \n",
" IL | \n",
" 1353 | \n",
" Illinois | \n",
" IL | \n",
"
\n",
" \n",
" 13 | \n",
" IN | \n",
" 620 | \n",
" Indiana | \n",
" IN | \n",
"
\n",
" \n",
" 14 | \n",
" IA | \n",
" 106 | \n",
" Iowa | \n",
" IA | \n",
"
\n",
" \n",
" 15 | \n",
" KS | \n",
" 195 | \n",
" Kansas | \n",
" KS | \n",
"
\n",
" \n",
" 16 | \n",
" KY | \n",
" 404 | \n",
" Kentucky | \n",
" KY | \n",
"
\n",
" \n",
" 17 | \n",
" LA | \n",
" 873 | \n",
" Louisiana | \n",
" LA | \n",
"
\n",
" \n",
" 18 | \n",
" ME | \n",
" 21 | \n",
" Maine | \n",
" ME | \n",
"
\n",
" \n",
" 19 | \n",
" MD | \n",
" 649 | \n",
" Maryland | \n",
" MD | \n",
"
\n",
" \n",
" 20 | \n",
" MA | \n",
" 183 | \n",
" Massachusetts | \n",
" MA | \n",
"
\n",
" \n",
" 21 | \n",
" MI | \n",
" 811 | \n",
" Michigan | \n",
" MI | \n",
"
\n",
" \n",
" 22 | \n",
" MN | \n",
" 198 | \n",
" Minnesota | \n",
" MN | \n",
"
\n",
" \n",
" 23 | \n",
" MS | \n",
" 576 | \n",
" Mississippi | \n",
" MS | \n",
"
\n",
" \n",
" 24 | \n",
" MO | \n",
" 803 | \n",
" Missouri | \n",
" MO | \n",
"
\n",
" \n",
" 25 | \n",
" MT | \n",
" 65 | \n",
" Montana | \n",
" MT | \n",
"
\n",
" \n",
" 26 | \n",
" NE | \n",
" 76 | \n",
" Nebraska | \n",
" NE | \n",
"
\n",
" \n",
" 27 | \n",
" NV | \n",
" 217 | \n",
" Nevada | \n",
" NV | \n",
"
\n",
" \n",
" 28 | \n",
" NH | \n",
" 14 | \n",
" New Hampshire | \n",
" NH | \n",
"
\n",
" \n",
" 29 | \n",
" NJ | \n",
" 362 | \n",
" New Jersey | \n",
" NJ | \n",
"
\n",
" \n",
" 30 | \n",
" NM | \n",
" 216 | \n",
" New Mexico | \n",
" NM | \n",
"
\n",
" \n",
" 31 | \n",
" NY | \n",
" 875 | \n",
" New York | \n",
" NY | \n",
"
\n",
" \n",
" 32 | \n",
" NC | \n",
" 884 | \n",
" North Carolina | \n",
" NC | \n",
"
\n",
" \n",
" 33 | \n",
" ND | \n",
" 30 | \n",
" North Dakota | \n",
" ND | \n",
"
\n",
" \n",
" 34 | \n",
" OH | \n",
" 1004 | \n",
" Ohio | \n",
" OH | \n",
"
\n",
" \n",
" 35 | \n",
" OK | \n",
" 342 | \n",
" Oklahoma | \n",
" OK | \n",
"
\n",
" \n",
" 36 | \n",
" OR | \n",
" 157 | \n",
" Oregon | \n",
" OR | \n",
"
\n",
" \n",
" 37 | \n",
" PA | \n",
" 994 | \n",
" Pennsylvania | \n",
" PA | \n",
"
\n",
" \n",
" 38 | \n",
" RI | \n",
" 29 | \n",
" Rhode Island | \n",
" RI | \n",
"
\n",
" \n",
" 39 | \n",
" SC | \n",
" 622 | \n",
" South Carolina | \n",
" SC | \n",
"
\n",
" \n",
" 40 | \n",
" SD | \n",
" 52 | \n",
" South Dakota | \n",
" SD | \n",
"
\n",
" \n",
" 41 | \n",
" TN | \n",
" 753 | \n",
" Tennessee | \n",
" TN | \n",
"
\n",
" \n",
" 42 | \n",
" TX | \n",
" 2212 | \n",
" Texas | \n",
" TX | \n",
"
\n",
" \n",
" 43 | \n",
" UT | \n",
" 95 | \n",
" Utah | \n",
" UT | \n",
"
\n",
" \n",
" 44 | \n",
" VT | \n",
" 14 | \n",
" Vermont | \n",
" VT | \n",
"
\n",
" \n",
" 45 | \n",
" VA | \n",
" 531 | \n",
" Virginia | \n",
" VA | \n",
"
\n",
" \n",
" 46 | \n",
" WA | \n",
" 322 | \n",
" Washington | \n",
" WA | \n",
"
\n",
" \n",
" 47 | \n",
" WV | \n",
" 114 | \n",
" West Virginia | \n",
" WV | \n",
"
\n",
" \n",
" 48 | \n",
" WI | \n",
" 334 | \n",
" Wisconsin | \n",
" WI | \n",
"
\n",
" \n",
" 49 | \n",
" WY | \n",
" 25 | \n",
" Wyoming | \n",
" WY | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" STATE DEATHS State Code\n",
"0 AL 654 Alabama AL\n",
"1 AK 55 Alaska AK\n",
"2 AZ 525 Arizona AZ\n",
"3 AR 371 Arkansas AR\n",
"4 CA 2368 California CA\n",
"5 CO 336 Colorado CO\n",
"6 CT 152 Connecticut CT\n",
"7 DE 86 Delaware DE\n",
"8 FL 1530 Florida FL\n",
"9 GA 1093 Georgia GA\n",
"10 HI 46 Hawaii HI\n",
"11 ID 42 Idaho ID\n",
"12 IL 1353 Illinois IL\n",
"13 IN 620 Indiana IN\n",
"14 IA 106 Iowa IA\n",
"15 KS 195 Kansas KS\n",
"16 KY 404 Kentucky KY\n",
"17 LA 873 Louisiana LA\n",
"18 ME 21 Maine ME\n",
"19 MD 649 Maryland MD\n",
"20 MA 183 Massachusetts MA\n",
"21 MI 811 Michigan MI\n",
"22 MN 198 Minnesota MN\n",
"23 MS 576 Mississippi MS\n",
"24 MO 803 Missouri MO\n",
"25 MT 65 Montana MT\n",
"26 NE 76 Nebraska NE\n",
"27 NV 217 Nevada NV\n",
"28 NH 14 New Hampshire NH\n",
"29 NJ 362 New Jersey NJ\n",
"30 NM 216 New Mexico NM\n",
"31 NY 875 New York NY\n",
"32 NC 884 North Carolina NC\n",
"33 ND 30 North Dakota ND\n",
"34 OH 1004 Ohio OH\n",
"35 OK 342 Oklahoma OK\n",
"36 OR 157 Oregon OR\n",
"37 PA 994 Pennsylvania PA\n",
"38 RI 29 Rhode Island RI\n",
"39 SC 622 South Carolina SC\n",
"40 SD 52 South Dakota SD\n",
"41 TN 753 Tennessee TN\n",
"42 TX 2212 Texas TX\n",
"43 UT 95 Utah UT\n",
"44 VT 14 Vermont VT\n",
"45 VA 531 Virginia VA\n",
"46 WA 322 Washington WA\n",
"47 WV 114 West Virginia WV\n",
"48 WI 334 Wisconsin WI\n",
"49 WY 25 Wyoming WY"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"homicides_with_states = pd.merge(\n",
" homicides, names, left_on=\"STATE\", right_on=\"Code\", how=\"left\"\n",
")\n",
"homicides_with_states"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now we have those items combined, but we don't need the \"STATE\" or \"Code\" columns, so let's drop them using the `drop` method. We'll do this operation in place so that rather than having to write:\n",
"\n",
"`homicides3 = homicides_with_states.drop(columns=['STATE','Code'])`\n",
"\n",
"which leads to a proliferation of variables each time we want to do a new operation (`homicide3`, `homicide4`, etc.); instead we can perform the operation directly on the data using the `inplace=True` keyword."
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" DEATHS | \n",
" State | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 654 | \n",
" Alabama | \n",
"
\n",
" \n",
" 1 | \n",
" 55 | \n",
" Alaska | \n",
"
\n",
" \n",
" 2 | \n",
" 525 | \n",
" Arizona | \n",
"
\n",
" \n",
" 3 | \n",
" 371 | \n",
" Arkansas | \n",
"
\n",
" \n",
" 4 | \n",
" 2368 | \n",
" California | \n",
"
\n",
" \n",
" 5 | \n",
" 336 | \n",
" Colorado | \n",
"
\n",
" \n",
" 6 | \n",
" 152 | \n",
" Connecticut | \n",
"
\n",
" \n",
" 7 | \n",
" 86 | \n",
" Delaware | \n",
"
\n",
" \n",
" 8 | \n",
" 1530 | \n",
" Florida | \n",
"
\n",
" \n",
" 9 | \n",
" 1093 | \n",
" Georgia | \n",
"
\n",
" \n",
" 10 | \n",
" 46 | \n",
" Hawaii | \n",
"
\n",
" \n",
" 11 | \n",
" 42 | \n",
" Idaho | \n",
"
\n",
" \n",
" 12 | \n",
" 1353 | \n",
" Illinois | \n",
"
\n",
" \n",
" 13 | \n",
" 620 | \n",
" Indiana | \n",
"
\n",
" \n",
" 14 | \n",
" 106 | \n",
" Iowa | \n",
"
\n",
" \n",
" 15 | \n",
" 195 | \n",
" Kansas | \n",
"
\n",
" \n",
" 16 | \n",
" 404 | \n",
" Kentucky | \n",
"
\n",
" \n",
" 17 | \n",
" 873 | \n",
" Louisiana | \n",
"
\n",
" \n",
" 18 | \n",
" 21 | \n",
" Maine | \n",
"
\n",
" \n",
" 19 | \n",
" 649 | \n",
" Maryland | \n",
"
\n",
" \n",
" 20 | \n",
" 183 | \n",
" Massachusetts | \n",
"
\n",
" \n",
" 21 | \n",
" 811 | \n",
" Michigan | \n",
"
\n",
" \n",
" 22 | \n",
" 198 | \n",
" Minnesota | \n",
"
\n",
" \n",
" 23 | \n",
" 576 | \n",
" Mississippi | \n",
"
\n",
" \n",
" 24 | \n",
" 803 | \n",
" Missouri | \n",
"
\n",
" \n",
" 25 | \n",
" 65 | \n",
" Montana | \n",
"
\n",
" \n",
" 26 | \n",
" 76 | \n",
" Nebraska | \n",
"
\n",
" \n",
" 27 | \n",
" 217 | \n",
" Nevada | \n",
"
\n",
" \n",
" 28 | \n",
" 14 | \n",
" New Hampshire | \n",
"
\n",
" \n",
" 29 | \n",
" 362 | \n",
" New Jersey | \n",
"
\n",
" \n",
" 30 | \n",
" 216 | \n",
" New Mexico | \n",
"
\n",
" \n",
" 31 | \n",
" 875 | \n",
" New York | \n",
"
\n",
" \n",
" 32 | \n",
" 884 | \n",
" North Carolina | \n",
"
\n",
" \n",
" 33 | \n",
" 30 | \n",
" North Dakota | \n",
"
\n",
" \n",
" 34 | \n",
" 1004 | \n",
" Ohio | \n",
"
\n",
" \n",
" 35 | \n",
" 342 | \n",
" Oklahoma | \n",
"
\n",
" \n",
" 36 | \n",
" 157 | \n",
" Oregon | \n",
"
\n",
" \n",
" 37 | \n",
" 994 | \n",
" Pennsylvania | \n",
"
\n",
" \n",
" 38 | \n",
" 29 | \n",
" Rhode Island | \n",
"
\n",
" \n",
" 39 | \n",
" 622 | \n",
" South Carolina | \n",
"
\n",
" \n",
" 40 | \n",
" 52 | \n",
" South Dakota | \n",
"
\n",
" \n",
" 41 | \n",
" 753 | \n",
" Tennessee | \n",
"
\n",
" \n",
" 42 | \n",
" 2212 | \n",
" Texas | \n",
"
\n",
" \n",
" 43 | \n",
" 95 | \n",
" Utah | \n",
"
\n",
" \n",
" 44 | \n",
" 14 | \n",
" Vermont | \n",
"
\n",
" \n",
" 45 | \n",
" 531 | \n",
" Virginia | \n",
"
\n",
" \n",
" 46 | \n",
" 322 | \n",
" Washington | \n",
"
\n",
" \n",
" 47 | \n",
" 114 | \n",
" West Virginia | \n",
"
\n",
" \n",
" 48 | \n",
" 334 | \n",
" Wisconsin | \n",
"
\n",
" \n",
" 49 | \n",
" 25 | \n",
" Wyoming | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" DEATHS State\n",
"0 654 Alabama\n",
"1 55 Alaska\n",
"2 525 Arizona\n",
"3 371 Arkansas\n",
"4 2368 California\n",
"5 336 Colorado\n",
"6 152 Connecticut\n",
"7 86 Delaware\n",
"8 1530 Florida\n",
"9 1093 Georgia\n",
"10 46 Hawaii\n",
"11 42 Idaho\n",
"12 1353 Illinois\n",
"13 620 Indiana\n",
"14 106 Iowa\n",
"15 195 Kansas\n",
"16 404 Kentucky\n",
"17 873 Louisiana\n",
"18 21 Maine\n",
"19 649 Maryland\n",
"20 183 Massachusetts\n",
"21 811 Michigan\n",
"22 198 Minnesota\n",
"23 576 Mississippi\n",
"24 803 Missouri\n",
"25 65 Montana\n",
"26 76 Nebraska\n",
"27 217 Nevada\n",
"28 14 New Hampshire\n",
"29 362 New Jersey\n",
"30 216 New Mexico\n",
"31 875 New York\n",
"32 884 North Carolina\n",
"33 30 North Dakota\n",
"34 1004 Ohio\n",
"35 342 Oklahoma\n",
"36 157 Oregon\n",
"37 994 Pennsylvania\n",
"38 29 Rhode Island\n",
"39 622 South Carolina\n",
"40 52 South Dakota\n",
"41 753 Tennessee\n",
"42 2212 Texas\n",
"43 95 Utah\n",
"44 14 Vermont\n",
"45 531 Virginia\n",
"46 322 Washington\n",
"47 114 West Virginia\n",
"48 334 Wisconsin\n",
"49 25 Wyoming"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"homicides_with_states = homicides_with_states.drop(columns=[\"STATE\", \"Code\"])\n",
"homicides_with_states"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"And just because it's easier to reference cleaned and ordered labels, let's rename 'DEATHS' to 'homicides' using `rename`."
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" homicides | \n",
" state | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 654 | \n",
" Alabama | \n",
"
\n",
" \n",
" 1 | \n",
" 55 | \n",
" Alaska | \n",
"
\n",
" \n",
" 2 | \n",
" 525 | \n",
" Arizona | \n",
"
\n",
" \n",
" 3 | \n",
" 371 | \n",
" Arkansas | \n",
"
\n",
" \n",
" 4 | \n",
" 2368 | \n",
" California | \n",
"
\n",
" \n",
" 5 | \n",
" 336 | \n",
" Colorado | \n",
"
\n",
" \n",
" 6 | \n",
" 152 | \n",
" Connecticut | \n",
"
\n",
" \n",
" 7 | \n",
" 86 | \n",
" Delaware | \n",
"
\n",
" \n",
" 8 | \n",
" 1530 | \n",
" Florida | \n",
"
\n",
" \n",
" 9 | \n",
" 1093 | \n",
" Georgia | \n",
"
\n",
" \n",
" 10 | \n",
" 46 | \n",
" Hawaii | \n",
"
\n",
" \n",
" 11 | \n",
" 42 | \n",
" Idaho | \n",
"
\n",
" \n",
" 12 | \n",
" 1353 | \n",
" Illinois | \n",
"
\n",
" \n",
" 13 | \n",
" 620 | \n",
" Indiana | \n",
"
\n",
" \n",
" 14 | \n",
" 106 | \n",
" Iowa | \n",
"
\n",
" \n",
" 15 | \n",
" 195 | \n",
" Kansas | \n",
"
\n",
" \n",
" 16 | \n",
" 404 | \n",
" Kentucky | \n",
"
\n",
" \n",
" 17 | \n",
" 873 | \n",
" Louisiana | \n",
"
\n",
" \n",
" 18 | \n",
" 21 | \n",
" Maine | \n",
"
\n",
" \n",
" 19 | \n",
" 649 | \n",
" Maryland | \n",
"
\n",
" \n",
" 20 | \n",
" 183 | \n",
" Massachusetts | \n",
"
\n",
" \n",
" 21 | \n",
" 811 | \n",
" Michigan | \n",
"
\n",
" \n",
" 22 | \n",
" 198 | \n",
" Minnesota | \n",
"
\n",
" \n",
" 23 | \n",
" 576 | \n",
" Mississippi | \n",
"
\n",
" \n",
" 24 | \n",
" 803 | \n",
" Missouri | \n",
"
\n",
" \n",
" 25 | \n",
" 65 | \n",
" Montana | \n",
"
\n",
" \n",
" 26 | \n",
" 76 | \n",
" Nebraska | \n",
"
\n",
" \n",
" 27 | \n",
" 217 | \n",
" Nevada | \n",
"
\n",
" \n",
" 28 | \n",
" 14 | \n",
" New Hampshire | \n",
"
\n",
" \n",
" 29 | \n",
" 362 | \n",
" New Jersey | \n",
"
\n",
" \n",
" 30 | \n",
" 216 | \n",
" New Mexico | \n",
"
\n",
" \n",
" 31 | \n",
" 875 | \n",
" New York | \n",
"
\n",
" \n",
" 32 | \n",
" 884 | \n",
" North Carolina | \n",
"
\n",
" \n",
" 33 | \n",
" 30 | \n",
" North Dakota | \n",
"
\n",
" \n",
" 34 | \n",
" 1004 | \n",
" Ohio | \n",
"
\n",
" \n",
" 35 | \n",
" 342 | \n",
" Oklahoma | \n",
"
\n",
" \n",
" 36 | \n",
" 157 | \n",
" Oregon | \n",
"
\n",
" \n",
" 37 | \n",
" 994 | \n",
" Pennsylvania | \n",
"
\n",
" \n",
" 38 | \n",
" 29 | \n",
" Rhode Island | \n",
"
\n",
" \n",
" 39 | \n",
" 622 | \n",
" South Carolina | \n",
"
\n",
" \n",
" 40 | \n",
" 52 | \n",
" South Dakota | \n",
"
\n",
" \n",
" 41 | \n",
" 753 | \n",
" Tennessee | \n",
"
\n",
" \n",
" 42 | \n",
" 2212 | \n",
" Texas | \n",
"
\n",
" \n",
" 43 | \n",
" 95 | \n",
" Utah | \n",
"
\n",
" \n",
" 44 | \n",
" 14 | \n",
" Vermont | \n",
"
\n",
" \n",
" 45 | \n",
" 531 | \n",
" Virginia | \n",
"
\n",
" \n",
" 46 | \n",
" 322 | \n",
" Washington | \n",
"
\n",
" \n",
" 47 | \n",
" 114 | \n",
" West Virginia | \n",
"
\n",
" \n",
" 48 | \n",
" 334 | \n",
" Wisconsin | \n",
"
\n",
" \n",
" 49 | \n",
" 25 | \n",
" Wyoming | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" homicides state\n",
"0 654 Alabama\n",
"1 55 Alaska\n",
"2 525 Arizona\n",
"3 371 Arkansas\n",
"4 2368 California\n",
"5 336 Colorado\n",
"6 152 Connecticut\n",
"7 86 Delaware\n",
"8 1530 Florida\n",
"9 1093 Georgia\n",
"10 46 Hawaii\n",
"11 42 Idaho\n",
"12 1353 Illinois\n",
"13 620 Indiana\n",
"14 106 Iowa\n",
"15 195 Kansas\n",
"16 404 Kentucky\n",
"17 873 Louisiana\n",
"18 21 Maine\n",
"19 649 Maryland\n",
"20 183 Massachusetts\n",
"21 811 Michigan\n",
"22 198 Minnesota\n",
"23 576 Mississippi\n",
"24 803 Missouri\n",
"25 65 Montana\n",
"26 76 Nebraska\n",
"27 217 Nevada\n",
"28 14 New Hampshire\n",
"29 362 New Jersey\n",
"30 216 New Mexico\n",
"31 875 New York\n",
"32 884 North Carolina\n",
"33 30 North Dakota\n",
"34 1004 Ohio\n",
"35 342 Oklahoma\n",
"36 157 Oregon\n",
"37 994 Pennsylvania\n",
"38 29 Rhode Island\n",
"39 622 South Carolina\n",
"40 52 South Dakota\n",
"41 753 Tennessee\n",
"42 2212 Texas\n",
"43 95 Utah\n",
"44 14 Vermont\n",
"45 531 Virginia\n",
"46 322 Washington\n",
"47 114 West Virginia\n",
"48 334 Wisconsin\n",
"49 25 Wyoming"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"homicides_with_states = homicides_with_states.rename(\n",
" columns={\"DEATHS\": \"homicides\", \"State\": \"state\"}\n",
")\n",
"homicides_with_states"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Great - now we're ready to merge in our population data. Let's perform an outer merge so that we keep all our data, then we can check what's missing and remove it as appropriate."
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" homicides | \n",
" state | \n",
" NAME | \n",
" POPESTIMATE2020 | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 654.0 | \n",
" Alabama | \n",
" Alabama | \n",
" 5024803 | \n",
"
\n",
" \n",
" 1 | \n",
" 55.0 | \n",
" Alaska | \n",
" Alaska | \n",
" 732441 | \n",
"
\n",
" \n",
" 2 | \n",
" 525.0 | \n",
" Arizona | \n",
" Arizona | \n",
" 7177986 | \n",
"
\n",
" \n",
" 3 | \n",
" 371.0 | \n",
" Arkansas | \n",
" Arkansas | \n",
" 3012232 | \n",
"
\n",
" \n",
" 4 | \n",
" 2368.0 | \n",
" California | \n",
" California | \n",
" 39499738 | \n",
"
\n",
" \n",
" 5 | \n",
" 336.0 | \n",
" Colorado | \n",
" Colorado | \n",
" 5784308 | \n",
"
\n",
" \n",
" 6 | \n",
" 152.0 | \n",
" Connecticut | \n",
" Connecticut | \n",
" 3600260 | \n",
"
\n",
" \n",
" 7 | \n",
" 86.0 | \n",
" Delaware | \n",
" Delaware | \n",
" 991886 | \n",
"
\n",
" \n",
" 8 | \n",
" 1530.0 | \n",
" Florida | \n",
" Florida | \n",
" 21569932 | \n",
"
\n",
" \n",
" 9 | \n",
" 1093.0 | \n",
" Georgia | \n",
" Georgia | \n",
" 10725800 | \n",
"
\n",
" \n",
" 10 | \n",
" 46.0 | \n",
" Hawaii | \n",
" Hawaii | \n",
" 1451911 | \n",
"
\n",
" \n",
" 11 | \n",
" 42.0 | \n",
" Idaho | \n",
" Idaho | \n",
" 1847772 | \n",
"
\n",
" \n",
" 12 | \n",
" 1353.0 | \n",
" Illinois | \n",
" Illinois | \n",
" 12785245 | \n",
"
\n",
" \n",
" 13 | \n",
" 620.0 | \n",
" Indiana | \n",
" Indiana | \n",
" 6785644 | \n",
"
\n",
" \n",
" 14 | \n",
" 106.0 | \n",
" Iowa | \n",
" Iowa | \n",
" 3188669 | \n",
"
\n",
" \n",
" 15 | \n",
" 195.0 | \n",
" Kansas | \n",
" Kansas | \n",
" 2935880 | \n",
"
\n",
" \n",
" 16 | \n",
" 404.0 | \n",
" Kentucky | \n",
" Kentucky | \n",
" 4503958 | \n",
"
\n",
" \n",
" 17 | \n",
" 873.0 | \n",
" Louisiana | \n",
" Louisiana | \n",
" 4651203 | \n",
"
\n",
" \n",
" 18 | \n",
" 21.0 | \n",
" Maine | \n",
" Maine | \n",
" 1362280 | \n",
"
\n",
" \n",
" 19 | \n",
" 649.0 | \n",
" Maryland | \n",
" Maryland | \n",
" 6172679 | \n",
"
\n",
" \n",
" 20 | \n",
" 183.0 | \n",
" Massachusetts | \n",
" Massachusetts | \n",
" 7022220 | \n",
"
\n",
" \n",
" 21 | \n",
" 811.0 | \n",
" Michigan | \n",
" Michigan | \n",
" 10067664 | \n",
"
\n",
" \n",
" 22 | \n",
" 198.0 | \n",
" Minnesota | \n",
" Minnesota | \n",
" 5707165 | \n",
"
\n",
" \n",
" 23 | \n",
" 576.0 | \n",
" Mississippi | \n",
" Mississippi | \n",
" 2956870 | \n",
"
\n",
" \n",
" 24 | \n",
" 803.0 | \n",
" Missouri | \n",
" Missouri | \n",
" 6154481 | \n",
"
\n",
" \n",
" 25 | \n",
" 65.0 | \n",
" Montana | \n",
" Montana | \n",
" 1086193 | \n",
"
\n",
" \n",
" 26 | \n",
" 76.0 | \n",
" Nebraska | \n",
" Nebraska | \n",
" 1961455 | \n",
"
\n",
" \n",
" 27 | \n",
" 217.0 | \n",
" Nevada | \n",
" Nevada | \n",
" 3114071 | \n",
"
\n",
" \n",
" 28 | \n",
" 14.0 | \n",
" New Hampshire | \n",
" New Hampshire | \n",
" 1377848 | \n",
"
\n",
" \n",
" 29 | \n",
" 362.0 | \n",
" New Jersey | \n",
" New Jersey | \n",
" 9279743 | \n",
"
\n",
" \n",
" 30 | \n",
" 216.0 | \n",
" New Mexico | \n",
" New Mexico | \n",
" 2117566 | \n",
"
\n",
" \n",
" 31 | \n",
" 875.0 | \n",
" New York | \n",
" New York | \n",
" 20154933 | \n",
"
\n",
" \n",
" 32 | \n",
" 884.0 | \n",
" North Carolina | \n",
" North Carolina | \n",
" 10457177 | \n",
"
\n",
" \n",
" 33 | \n",
" 30.0 | \n",
" North Dakota | \n",
" North Dakota | \n",
" 778962 | \n",
"
\n",
" \n",
" 34 | \n",
" 1004.0 | \n",
" Ohio | \n",
" Ohio | \n",
" 11790587 | \n",
"
\n",
" \n",
" 35 | \n",
" 342.0 | \n",
" Oklahoma | \n",
" Oklahoma | \n",
" 3962031 | \n",
"
\n",
" \n",
" 36 | \n",
" 157.0 | \n",
" Oregon | \n",
" Oregon | \n",
" 4241544 | \n",
"
\n",
" \n",
" 37 | \n",
" 994.0 | \n",
" Pennsylvania | \n",
" Pennsylvania | \n",
" 12989625 | \n",
"
\n",
" \n",
" 38 | \n",
" 29.0 | \n",
" Rhode Island | \n",
" Rhode Island | \n",
" 1096229 | \n",
"
\n",
" \n",
" 39 | \n",
" 622.0 | \n",
" South Carolina | \n",
" South Carolina | \n",
" 5130729 | \n",
"
\n",
" \n",
" 40 | \n",
" 52.0 | \n",
" South Dakota | \n",
" South Dakota | \n",
" 887099 | \n",
"
\n",
" \n",
" 41 | \n",
" 753.0 | \n",
" Tennessee | \n",
" Tennessee | \n",
" 6920119 | \n",
"
\n",
" \n",
" 42 | \n",
" 2212.0 | \n",
" Texas | \n",
" Texas | \n",
" 29217653 | \n",
"
\n",
" \n",
" 43 | \n",
" 95.0 | \n",
" Utah | \n",
" Utah | \n",
" 3281684 | \n",
"
\n",
" \n",
" 44 | \n",
" 14.0 | \n",
" Vermont | \n",
" Vermont | \n",
" 642495 | \n",
"
\n",
" \n",
" 45 | \n",
" 531.0 | \n",
" Virginia | \n",
" Virginia | \n",
" 8632044 | \n",
"
\n",
" \n",
" 46 | \n",
" 322.0 | \n",
" Washington | \n",
" Washington | \n",
" 7718785 | \n",
"
\n",
" \n",
" 47 | \n",
" 114.0 | \n",
" West Virginia | \n",
" West Virginia | \n",
" 1789798 | \n",
"
\n",
" \n",
" 48 | \n",
" 334.0 | \n",
" Wisconsin | \n",
" Wisconsin | \n",
" 5892323 | \n",
"
\n",
" \n",
" 49 | \n",
" 25.0 | \n",
" Wyoming | \n",
" Wyoming | \n",
" 577267 | \n",
"
\n",
" \n",
" 50 | \n",
" NaN | \n",
" NaN | \n",
" United States | \n",
" 331501080 | \n",
"
\n",
" \n",
" 51 | \n",
" NaN | \n",
" NaN | \n",
" Northeast Region | \n",
" 57525633 | \n",
"
\n",
" \n",
" 52 | \n",
" NaN | \n",
" NaN | \n",
" Midwest Region | \n",
" 68935174 | \n",
"
\n",
" \n",
" 53 | \n",
" NaN | \n",
" NaN | \n",
" South Region | \n",
" 126409007 | \n",
"
\n",
" \n",
" 54 | \n",
" NaN | \n",
" NaN | \n",
" West Region | \n",
" 78631266 | \n",
"
\n",
" \n",
" 55 | \n",
" NaN | \n",
" NaN | \n",
" District of Columbia | \n",
" 690093 | \n",
"
\n",
" \n",
" 56 | \n",
" NaN | \n",
" NaN | \n",
" Puerto Rico | \n",
" 3281538 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" homicides state NAME POPESTIMATE2020\n",
"0 654.0 Alabama Alabama 5024803\n",
"1 55.0 Alaska Alaska 732441\n",
"2 525.0 Arizona Arizona 7177986\n",
"3 371.0 Arkansas Arkansas 3012232\n",
"4 2368.0 California California 39499738\n",
"5 336.0 Colorado Colorado 5784308\n",
"6 152.0 Connecticut Connecticut 3600260\n",
"7 86.0 Delaware Delaware 991886\n",
"8 1530.0 Florida Florida 21569932\n",
"9 1093.0 Georgia Georgia 10725800\n",
"10 46.0 Hawaii Hawaii 1451911\n",
"11 42.0 Idaho Idaho 1847772\n",
"12 1353.0 Illinois Illinois 12785245\n",
"13 620.0 Indiana Indiana 6785644\n",
"14 106.0 Iowa Iowa 3188669\n",
"15 195.0 Kansas Kansas 2935880\n",
"16 404.0 Kentucky Kentucky 4503958\n",
"17 873.0 Louisiana Louisiana 4651203\n",
"18 21.0 Maine Maine 1362280\n",
"19 649.0 Maryland Maryland 6172679\n",
"20 183.0 Massachusetts Massachusetts 7022220\n",
"21 811.0 Michigan Michigan 10067664\n",
"22 198.0 Minnesota Minnesota 5707165\n",
"23 576.0 Mississippi Mississippi 2956870\n",
"24 803.0 Missouri Missouri 6154481\n",
"25 65.0 Montana Montana 1086193\n",
"26 76.0 Nebraska Nebraska 1961455\n",
"27 217.0 Nevada Nevada 3114071\n",
"28 14.0 New Hampshire New Hampshire 1377848\n",
"29 362.0 New Jersey New Jersey 9279743\n",
"30 216.0 New Mexico New Mexico 2117566\n",
"31 875.0 New York New York 20154933\n",
"32 884.0 North Carolina North Carolina 10457177\n",
"33 30.0 North Dakota North Dakota 778962\n",
"34 1004.0 Ohio Ohio 11790587\n",
"35 342.0 Oklahoma Oklahoma 3962031\n",
"36 157.0 Oregon Oregon 4241544\n",
"37 994.0 Pennsylvania Pennsylvania 12989625\n",
"38 29.0 Rhode Island Rhode Island 1096229\n",
"39 622.0 South Carolina South Carolina 5130729\n",
"40 52.0 South Dakota South Dakota 887099\n",
"41 753.0 Tennessee Tennessee 6920119\n",
"42 2212.0 Texas Texas 29217653\n",
"43 95.0 Utah Utah 3281684\n",
"44 14.0 Vermont Vermont 642495\n",
"45 531.0 Virginia Virginia 8632044\n",
"46 322.0 Washington Washington 7718785\n",
"47 114.0 West Virginia West Virginia 1789798\n",
"48 334.0 Wisconsin Wisconsin 5892323\n",
"49 25.0 Wyoming Wyoming 577267\n",
"50 NaN NaN United States 331501080\n",
"51 NaN NaN Northeast Region 57525633\n",
"52 NaN NaN Midwest Region 68935174\n",
"53 NaN NaN South Region 126409007\n",
"54 NaN NaN West Region 78631266\n",
"55 NaN NaN District of Columbia 690093\n",
"56 NaN NaN Puerto Rico 3281538"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"combined = pd.merge(\n",
" homicides_with_states, population, how=\"outer\", left_on=\"state\", right_on=\"NAME\"\n",
")\n",
"combined"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can see from the above that we have a number of `NaN` values. Let's look at rows with any `NaN` values:"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" homicides | \n",
" state | \n",
" NAME | \n",
" POPESTIMATE2020 | \n",
"
\n",
" \n",
" \n",
" \n",
" 50 | \n",
" NaN | \n",
" NaN | \n",
" United States | \n",
" 331501080 | \n",
"
\n",
" \n",
" 51 | \n",
" NaN | \n",
" NaN | \n",
" Northeast Region | \n",
" 57525633 | \n",
"
\n",
" \n",
" 52 | \n",
" NaN | \n",
" NaN | \n",
" Midwest Region | \n",
" 68935174 | \n",
"
\n",
" \n",
" 53 | \n",
" NaN | \n",
" NaN | \n",
" South Region | \n",
" 126409007 | \n",
"
\n",
" \n",
" 54 | \n",
" NaN | \n",
" NaN | \n",
" West Region | \n",
" 78631266 | \n",
"
\n",
" \n",
" 55 | \n",
" NaN | \n",
" NaN | \n",
" District of Columbia | \n",
" 690093 | \n",
"
\n",
" \n",
" 56 | \n",
" NaN | \n",
" NaN | \n",
" Puerto Rico | \n",
" 3281538 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" homicides state NAME POPESTIMATE2020\n",
"50 NaN NaN United States 331501080\n",
"51 NaN NaN Northeast Region 57525633\n",
"52 NaN NaN Midwest Region 68935174\n",
"53 NaN NaN South Region 126409007\n",
"54 NaN NaN West Region 78631266\n",
"55 NaN NaN District of Columbia 690093\n",
"56 NaN NaN Puerto Rico 3281538"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"combined[combined.isna().any(axis=1)]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's just break down the above statement for a moment for comprehension. `combined.isna()` returns a True value for each entry in the DataFrame with a `NaN` value. The `.any(axis=1)` looks across axis 1 and if any of the entries in that row (across all columns) has a True value, then that row is represented with a True. Let's look at the steps so it's clear:"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" homicides | \n",
" state | \n",
" NAME | \n",
" POPESTIMATE2020 | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" False | \n",
" False | \n",
" False | \n",
" False | \n",
"
\n",
" \n",
" 1 | \n",
" False | \n",
" False | \n",
" False | \n",
" False | \n",
"
\n",
" \n",
" 2 | \n",
" False | \n",
" False | \n",
" False | \n",
" False | \n",
"
\n",
" \n",
" 3 | \n",
" False | \n",
" False | \n",
" False | \n",
" False | \n",
"
\n",
" \n",
" 4 | \n",
" False | \n",
" False | \n",
" False | \n",
" False | \n",
"
\n",
" \n",
" 5 | \n",
" False | \n",
" False | \n",
" False | \n",
" False | \n",
"
\n",
" \n",
" 6 | \n",
" False | \n",
" False | \n",
" False | \n",
" False | \n",
"
\n",
" \n",
" 7 | \n",
" False | \n",
" False | \n",
" False | \n",
" False | \n",
"
\n",
" \n",
" 8 | \n",
" False | \n",
" False | \n",
" False | \n",
" False | \n",
"
\n",
" \n",
" 9 | \n",
" False | \n",
" False | \n",
" False | \n",
" False | \n",
"
\n",
" \n",
" 10 | \n",
" False | \n",
" False | \n",
" False | \n",
" False | \n",
"
\n",
" \n",
" 11 | \n",
" False | \n",
" False | \n",
" False | \n",
" False | \n",
"
\n",
" \n",
" 12 | \n",
" False | \n",
" False | \n",
" False | \n",
" False | \n",
"
\n",
" \n",
" 13 | \n",
" False | \n",
" False | \n",
" False | \n",
" False | \n",
"
\n",
" \n",
" 14 | \n",
" False | \n",
" False | \n",
" False | \n",
" False | \n",
"
\n",
" \n",
" 15 | \n",
" False | \n",
" False | \n",
" False | \n",
" False | \n",
"
\n",
" \n",
" 16 | \n",
" False | \n",
" False | \n",
" False | \n",
" False | \n",
"
\n",
" \n",
" 17 | \n",
" False | \n",
" False | \n",
" False | \n",
" False | \n",
"
\n",
" \n",
" 18 | \n",
" False | \n",
" False | \n",
" False | \n",
" False | \n",
"
\n",
" \n",
" 19 | \n",
" False | \n",
" False | \n",
" False | \n",
" False | \n",
"
\n",
" \n",
" 20 | \n",
" False | \n",
" False | \n",
" False | \n",
" False | \n",
"
\n",
" \n",
" 21 | \n",
" False | \n",
" False | \n",
" False | \n",
" False | \n",
"
\n",
" \n",
" 22 | \n",
" False | \n",
" False | \n",
" False | \n",
" False | \n",
"
\n",
" \n",
" 23 | \n",
" False | \n",
" False | \n",
" False | \n",
" False | \n",
"
\n",
" \n",
" 24 | \n",
" False | \n",
" False | \n",
" False | \n",
" False | \n",
"
\n",
" \n",
" 25 | \n",
" False | \n",
" False | \n",
" False | \n",
" False | \n",
"
\n",
" \n",
" 26 | \n",
" False | \n",
" False | \n",
" False | \n",
" False | \n",
"
\n",
" \n",
" 27 | \n",
" False | \n",
" False | \n",
" False | \n",
" False | \n",
"
\n",
" \n",
" 28 | \n",
" False | \n",
" False | \n",
" False | \n",
" False | \n",
"
\n",
" \n",
" 29 | \n",
" False | \n",
" False | \n",
" False | \n",
" False | \n",
"
\n",
" \n",
" 30 | \n",
" False | \n",
" False | \n",
" False | \n",
" False | \n",
"
\n",
" \n",
" 31 | \n",
" False | \n",
" False | \n",
" False | \n",
" False | \n",
"
\n",
" \n",
" 32 | \n",
" False | \n",
" False | \n",
" False | \n",
" False | \n",
"
\n",
" \n",
" 33 | \n",
" False | \n",
" False | \n",
" False | \n",
" False | \n",
"
\n",
" \n",
" 34 | \n",
" False | \n",
" False | \n",
" False | \n",
" False | \n",
"
\n",
" \n",
" 35 | \n",
" False | \n",
" False | \n",
" False | \n",
" False | \n",
"
\n",
" \n",
" 36 | \n",
" False | \n",
" False | \n",
" False | \n",
" False | \n",
"
\n",
" \n",
" 37 | \n",
" False | \n",
" False | \n",
" False | \n",
" False | \n",
"
\n",
" \n",
" 38 | \n",
" False | \n",
" False | \n",
" False | \n",
" False | \n",
"
\n",
" \n",
" 39 | \n",
" False | \n",
" False | \n",
" False | \n",
" False | \n",
"
\n",
" \n",
" 40 | \n",
" False | \n",
" False | \n",
" False | \n",
" False | \n",
"
\n",
" \n",
" 41 | \n",
" False | \n",
" False | \n",
" False | \n",
" False | \n",
"
\n",
" \n",
" 42 | \n",
" False | \n",
" False | \n",
" False | \n",
" False | \n",
"
\n",
" \n",
" 43 | \n",
" False | \n",
" False | \n",
" False | \n",
" False | \n",
"
\n",
" \n",
" 44 | \n",
" False | \n",
" False | \n",
" False | \n",
" False | \n",
"
\n",
" \n",
" 45 | \n",
" False | \n",
" False | \n",
" False | \n",
" False | \n",
"
\n",
" \n",
" 46 | \n",
" False | \n",
" False | \n",
" False | \n",
" False | \n",
"
\n",
" \n",
" 47 | \n",
" False | \n",
" False | \n",
" False | \n",
" False | \n",
"
\n",
" \n",
" 48 | \n",
" False | \n",
" False | \n",
" False | \n",
" False | \n",
"
\n",
" \n",
" 49 | \n",
" False | \n",
" False | \n",
" False | \n",
" False | \n",
"
\n",
" \n",
" 50 | \n",
" True | \n",
" True | \n",
" False | \n",
" False | \n",
"
\n",
" \n",
" 51 | \n",
" True | \n",
" True | \n",
" False | \n",
" False | \n",
"
\n",
" \n",
" 52 | \n",
" True | \n",
" True | \n",
" False | \n",
" False | \n",
"
\n",
" \n",
" 53 | \n",
" True | \n",
" True | \n",
" False | \n",
" False | \n",
"
\n",
" \n",
" 54 | \n",
" True | \n",
" True | \n",
" False | \n",
" False | \n",
"
\n",
" \n",
" 55 | \n",
" True | \n",
" True | \n",
" False | \n",
" False | \n",
"
\n",
" \n",
" 56 | \n",
" True | \n",
" True | \n",
" False | \n",
" False | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" homicides state NAME POPESTIMATE2020\n",
"0 False False False False\n",
"1 False False False False\n",
"2 False False False False\n",
"3 False False False False\n",
"4 False False False False\n",
"5 False False False False\n",
"6 False False False False\n",
"7 False False False False\n",
"8 False False False False\n",
"9 False False False False\n",
"10 False False False False\n",
"11 False False False False\n",
"12 False False False False\n",
"13 False False False False\n",
"14 False False False False\n",
"15 False False False False\n",
"16 False False False False\n",
"17 False False False False\n",
"18 False False False False\n",
"19 False False False False\n",
"20 False False False False\n",
"21 False False False False\n",
"22 False False False False\n",
"23 False False False False\n",
"24 False False False False\n",
"25 False False False False\n",
"26 False False False False\n",
"27 False False False False\n",
"28 False False False False\n",
"29 False False False False\n",
"30 False False False False\n",
"31 False False False False\n",
"32 False False False False\n",
"33 False False False False\n",
"34 False False False False\n",
"35 False False False False\n",
"36 False False False False\n",
"37 False False False False\n",
"38 False False False False\n",
"39 False False False False\n",
"40 False False False False\n",
"41 False False False False\n",
"42 False False False False\n",
"43 False False False False\n",
"44 False False False False\n",
"45 False False False False\n",
"46 False False False False\n",
"47 False False False False\n",
"48 False False False False\n",
"49 False False False False\n",
"50 True True False False\n",
"51 True True False False\n",
"52 True True False False\n",
"53 True True False False\n",
"54 True True False False\n",
"55 True True False False\n",
"56 True True False False"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"combined.isna()"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0 False\n",
"1 False\n",
"2 False\n",
"3 False\n",
"4 False\n",
"5 False\n",
"6 False\n",
"7 False\n",
"8 False\n",
"9 False\n",
"10 False\n",
"11 False\n",
"12 False\n",
"13 False\n",
"14 False\n",
"15 False\n",
"16 False\n",
"17 False\n",
"18 False\n",
"19 False\n",
"20 False\n",
"21 False\n",
"22 False\n",
"23 False\n",
"24 False\n",
"25 False\n",
"26 False\n",
"27 False\n",
"28 False\n",
"29 False\n",
"30 False\n",
"31 False\n",
"32 False\n",
"33 False\n",
"34 False\n",
"35 False\n",
"36 False\n",
"37 False\n",
"38 False\n",
"39 False\n",
"40 False\n",
"41 False\n",
"42 False\n",
"43 False\n",
"44 False\n",
"45 False\n",
"46 False\n",
"47 False\n",
"48 False\n",
"49 False\n",
"50 True\n",
"51 True\n",
"52 True\n",
"53 True\n",
"54 True\n",
"55 True\n",
"56 True\n",
"dtype: bool"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"combined.isna().any(axis=1)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"So when we put it altogether, it only displaces those rows with `NaN` values:"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" homicides | \n",
" state | \n",
" NAME | \n",
" POPESTIMATE2020 | \n",
"
\n",
" \n",
" \n",
" \n",
" 50 | \n",
" NaN | \n",
" NaN | \n",
" United States | \n",
" 331501080 | \n",
"
\n",
" \n",
" 51 | \n",
" NaN | \n",
" NaN | \n",
" Northeast Region | \n",
" 57525633 | \n",
"
\n",
" \n",
" 52 | \n",
" NaN | \n",
" NaN | \n",
" Midwest Region | \n",
" 68935174 | \n",
"
\n",
" \n",
" 53 | \n",
" NaN | \n",
" NaN | \n",
" South Region | \n",
" 126409007 | \n",
"
\n",
" \n",
" 54 | \n",
" NaN | \n",
" NaN | \n",
" West Region | \n",
" 78631266 | \n",
"
\n",
" \n",
" 55 | \n",
" NaN | \n",
" NaN | \n",
" District of Columbia | \n",
" 690093 | \n",
"
\n",
" \n",
" 56 | \n",
" NaN | \n",
" NaN | \n",
" Puerto Rico | \n",
" 3281538 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" homicides state NAME POPESTIMATE2020\n",
"50 NaN NaN United States 331501080\n",
"51 NaN NaN Northeast Region 57525633\n",
"52 NaN NaN Midwest Region 68935174\n",
"53 NaN NaN South Region 126409007\n",
"54 NaN NaN West Region 78631266\n",
"55 NaN NaN District of Columbia 690093\n",
"56 NaN NaN Puerto Rico 3281538"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"combined[combined.isna().any(axis=1)]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"What we can see here is that we don't need any of these regions, as none are U.S. states, so let's drop them using the `dropna` method:"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" homicides | \n",
" state | \n",
" NAME | \n",
" POPESTIMATE2020 | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 654.0 | \n",
" Alabama | \n",
" Alabama | \n",
" 5024803 | \n",
"
\n",
" \n",
" 1 | \n",
" 55.0 | \n",
" Alaska | \n",
" Alaska | \n",
" 732441 | \n",
"
\n",
" \n",
" 2 | \n",
" 525.0 | \n",
" Arizona | \n",
" Arizona | \n",
" 7177986 | \n",
"
\n",
" \n",
" 3 | \n",
" 371.0 | \n",
" Arkansas | \n",
" Arkansas | \n",
" 3012232 | \n",
"
\n",
" \n",
" 4 | \n",
" 2368.0 | \n",
" California | \n",
" California | \n",
" 39499738 | \n",
"
\n",
" \n",
" 5 | \n",
" 336.0 | \n",
" Colorado | \n",
" Colorado | \n",
" 5784308 | \n",
"
\n",
" \n",
" 6 | \n",
" 152.0 | \n",
" Connecticut | \n",
" Connecticut | \n",
" 3600260 | \n",
"
\n",
" \n",
" 7 | \n",
" 86.0 | \n",
" Delaware | \n",
" Delaware | \n",
" 991886 | \n",
"
\n",
" \n",
" 8 | \n",
" 1530.0 | \n",
" Florida | \n",
" Florida | \n",
" 21569932 | \n",
"
\n",
" \n",
" 9 | \n",
" 1093.0 | \n",
" Georgia | \n",
" Georgia | \n",
" 10725800 | \n",
"
\n",
" \n",
" 10 | \n",
" 46.0 | \n",
" Hawaii | \n",
" Hawaii | \n",
" 1451911 | \n",
"
\n",
" \n",
" 11 | \n",
" 42.0 | \n",
" Idaho | \n",
" Idaho | \n",
" 1847772 | \n",
"
\n",
" \n",
" 12 | \n",
" 1353.0 | \n",
" Illinois | \n",
" Illinois | \n",
" 12785245 | \n",
"
\n",
" \n",
" 13 | \n",
" 620.0 | \n",
" Indiana | \n",
" Indiana | \n",
" 6785644 | \n",
"
\n",
" \n",
" 14 | \n",
" 106.0 | \n",
" Iowa | \n",
" Iowa | \n",
" 3188669 | \n",
"
\n",
" \n",
" 15 | \n",
" 195.0 | \n",
" Kansas | \n",
" Kansas | \n",
" 2935880 | \n",
"
\n",
" \n",
" 16 | \n",
" 404.0 | \n",
" Kentucky | \n",
" Kentucky | \n",
" 4503958 | \n",
"
\n",
" \n",
" 17 | \n",
" 873.0 | \n",
" Louisiana | \n",
" Louisiana | \n",
" 4651203 | \n",
"
\n",
" \n",
" 18 | \n",
" 21.0 | \n",
" Maine | \n",
" Maine | \n",
" 1362280 | \n",
"
\n",
" \n",
" 19 | \n",
" 649.0 | \n",
" Maryland | \n",
" Maryland | \n",
" 6172679 | \n",
"
\n",
" \n",
" 20 | \n",
" 183.0 | \n",
" Massachusetts | \n",
" Massachusetts | \n",
" 7022220 | \n",
"
\n",
" \n",
" 21 | \n",
" 811.0 | \n",
" Michigan | \n",
" Michigan | \n",
" 10067664 | \n",
"
\n",
" \n",
" 22 | \n",
" 198.0 | \n",
" Minnesota | \n",
" Minnesota | \n",
" 5707165 | \n",
"
\n",
" \n",
" 23 | \n",
" 576.0 | \n",
" Mississippi | \n",
" Mississippi | \n",
" 2956870 | \n",
"
\n",
" \n",
" 24 | \n",
" 803.0 | \n",
" Missouri | \n",
" Missouri | \n",
" 6154481 | \n",
"
\n",
" \n",
" 25 | \n",
" 65.0 | \n",
" Montana | \n",
" Montana | \n",
" 1086193 | \n",
"
\n",
" \n",
" 26 | \n",
" 76.0 | \n",
" Nebraska | \n",
" Nebraska | \n",
" 1961455 | \n",
"
\n",
" \n",
" 27 | \n",
" 217.0 | \n",
" Nevada | \n",
" Nevada | \n",
" 3114071 | \n",
"
\n",
" \n",
" 28 | \n",
" 14.0 | \n",
" New Hampshire | \n",
" New Hampshire | \n",
" 1377848 | \n",
"
\n",
" \n",
" 29 | \n",
" 362.0 | \n",
" New Jersey | \n",
" New Jersey | \n",
" 9279743 | \n",
"
\n",
" \n",
" 30 | \n",
" 216.0 | \n",
" New Mexico | \n",
" New Mexico | \n",
" 2117566 | \n",
"
\n",
" \n",
" 31 | \n",
" 875.0 | \n",
" New York | \n",
" New York | \n",
" 20154933 | \n",
"
\n",
" \n",
" 32 | \n",
" 884.0 | \n",
" North Carolina | \n",
" North Carolina | \n",
" 10457177 | \n",
"
\n",
" \n",
" 33 | \n",
" 30.0 | \n",
" North Dakota | \n",
" North Dakota | \n",
" 778962 | \n",
"
\n",
" \n",
" 34 | \n",
" 1004.0 | \n",
" Ohio | \n",
" Ohio | \n",
" 11790587 | \n",
"
\n",
" \n",
" 35 | \n",
" 342.0 | \n",
" Oklahoma | \n",
" Oklahoma | \n",
" 3962031 | \n",
"
\n",
" \n",
" 36 | \n",
" 157.0 | \n",
" Oregon | \n",
" Oregon | \n",
" 4241544 | \n",
"
\n",
" \n",
" 37 | \n",
" 994.0 | \n",
" Pennsylvania | \n",
" Pennsylvania | \n",
" 12989625 | \n",
"
\n",
" \n",
" 38 | \n",
" 29.0 | \n",
" Rhode Island | \n",
" Rhode Island | \n",
" 1096229 | \n",
"
\n",
" \n",
" 39 | \n",
" 622.0 | \n",
" South Carolina | \n",
" South Carolina | \n",
" 5130729 | \n",
"
\n",
" \n",
" 40 | \n",
" 52.0 | \n",
" South Dakota | \n",
" South Dakota | \n",
" 887099 | \n",
"
\n",
" \n",
" 41 | \n",
" 753.0 | \n",
" Tennessee | \n",
" Tennessee | \n",
" 6920119 | \n",
"
\n",
" \n",
" 42 | \n",
" 2212.0 | \n",
" Texas | \n",
" Texas | \n",
" 29217653 | \n",
"
\n",
" \n",
" 43 | \n",
" 95.0 | \n",
" Utah | \n",
" Utah | \n",
" 3281684 | \n",
"
\n",
" \n",
" 44 | \n",
" 14.0 | \n",
" Vermont | \n",
" Vermont | \n",
" 642495 | \n",
"
\n",
" \n",
" 45 | \n",
" 531.0 | \n",
" Virginia | \n",
" Virginia | \n",
" 8632044 | \n",
"
\n",
" \n",
" 46 | \n",
" 322.0 | \n",
" Washington | \n",
" Washington | \n",
" 7718785 | \n",
"
\n",
" \n",
" 47 | \n",
" 114.0 | \n",
" West Virginia | \n",
" West Virginia | \n",
" 1789798 | \n",
"
\n",
" \n",
" 48 | \n",
" 334.0 | \n",
" Wisconsin | \n",
" Wisconsin | \n",
" 5892323 | \n",
"
\n",
" \n",
" 49 | \n",
" 25.0 | \n",
" Wyoming | \n",
" Wyoming | \n",
" 577267 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" homicides state NAME POPESTIMATE2020\n",
"0 654.0 Alabama Alabama 5024803\n",
"1 55.0 Alaska Alaska 732441\n",
"2 525.0 Arizona Arizona 7177986\n",
"3 371.0 Arkansas Arkansas 3012232\n",
"4 2368.0 California California 39499738\n",
"5 336.0 Colorado Colorado 5784308\n",
"6 152.0 Connecticut Connecticut 3600260\n",
"7 86.0 Delaware Delaware 991886\n",
"8 1530.0 Florida Florida 21569932\n",
"9 1093.0 Georgia Georgia 10725800\n",
"10 46.0 Hawaii Hawaii 1451911\n",
"11 42.0 Idaho Idaho 1847772\n",
"12 1353.0 Illinois Illinois 12785245\n",
"13 620.0 Indiana Indiana 6785644\n",
"14 106.0 Iowa Iowa 3188669\n",
"15 195.0 Kansas Kansas 2935880\n",
"16 404.0 Kentucky Kentucky 4503958\n",
"17 873.0 Louisiana Louisiana 4651203\n",
"18 21.0 Maine Maine 1362280\n",
"19 649.0 Maryland Maryland 6172679\n",
"20 183.0 Massachusetts Massachusetts 7022220\n",
"21 811.0 Michigan Michigan 10067664\n",
"22 198.0 Minnesota Minnesota 5707165\n",
"23 576.0 Mississippi Mississippi 2956870\n",
"24 803.0 Missouri Missouri 6154481\n",
"25 65.0 Montana Montana 1086193\n",
"26 76.0 Nebraska Nebraska 1961455\n",
"27 217.0 Nevada Nevada 3114071\n",
"28 14.0 New Hampshire New Hampshire 1377848\n",
"29 362.0 New Jersey New Jersey 9279743\n",
"30 216.0 New Mexico New Mexico 2117566\n",
"31 875.0 New York New York 20154933\n",
"32 884.0 North Carolina North Carolina 10457177\n",
"33 30.0 North Dakota North Dakota 778962\n",
"34 1004.0 Ohio Ohio 11790587\n",
"35 342.0 Oklahoma Oklahoma 3962031\n",
"36 157.0 Oregon Oregon 4241544\n",
"37 994.0 Pennsylvania Pennsylvania 12989625\n",
"38 29.0 Rhode Island Rhode Island 1096229\n",
"39 622.0 South Carolina South Carolina 5130729\n",
"40 52.0 South Dakota South Dakota 887099\n",
"41 753.0 Tennessee Tennessee 6920119\n",
"42 2212.0 Texas Texas 29217653\n",
"43 95.0 Utah Utah 3281684\n",
"44 14.0 Vermont Vermont 642495\n",
"45 531.0 Virginia Virginia 8632044\n",
"46 322.0 Washington Washington 7718785\n",
"47 114.0 West Virginia West Virginia 1789798\n",
"48 334.0 Wisconsin Wisconsin 5892323\n",
"49 25.0 Wyoming Wyoming 577267"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"combined = combined.dropna(axis=0)\n",
"combined"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now that we have our data let's take one more step to clean the formatting of our data. Let's drop the extra column with state names and rename 'POPESTIMATE2020' to 'population':"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" homicides | \n",
" state | \n",
" population | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 654.0 | \n",
" Alabama | \n",
" 5024803 | \n",
"
\n",
" \n",
" 1 | \n",
" 55.0 | \n",
" Alaska | \n",
" 732441 | \n",
"
\n",
" \n",
" 2 | \n",
" 525.0 | \n",
" Arizona | \n",
" 7177986 | \n",
"
\n",
" \n",
" 3 | \n",
" 371.0 | \n",
" Arkansas | \n",
" 3012232 | \n",
"
\n",
" \n",
" 4 | \n",
" 2368.0 | \n",
" California | \n",
" 39499738 | \n",
"
\n",
" \n",
" 5 | \n",
" 336.0 | \n",
" Colorado | \n",
" 5784308 | \n",
"
\n",
" \n",
" 6 | \n",
" 152.0 | \n",
" Connecticut | \n",
" 3600260 | \n",
"
\n",
" \n",
" 7 | \n",
" 86.0 | \n",
" Delaware | \n",
" 991886 | \n",
"
\n",
" \n",
" 8 | \n",
" 1530.0 | \n",
" Florida | \n",
" 21569932 | \n",
"
\n",
" \n",
" 9 | \n",
" 1093.0 | \n",
" Georgia | \n",
" 10725800 | \n",
"
\n",
" \n",
" 10 | \n",
" 46.0 | \n",
" Hawaii | \n",
" 1451911 | \n",
"
\n",
" \n",
" 11 | \n",
" 42.0 | \n",
" Idaho | \n",
" 1847772 | \n",
"
\n",
" \n",
" 12 | \n",
" 1353.0 | \n",
" Illinois | \n",
" 12785245 | \n",
"
\n",
" \n",
" 13 | \n",
" 620.0 | \n",
" Indiana | \n",
" 6785644 | \n",
"
\n",
" \n",
" 14 | \n",
" 106.0 | \n",
" Iowa | \n",
" 3188669 | \n",
"
\n",
" \n",
" 15 | \n",
" 195.0 | \n",
" Kansas | \n",
" 2935880 | \n",
"
\n",
" \n",
" 16 | \n",
" 404.0 | \n",
" Kentucky | \n",
" 4503958 | \n",
"
\n",
" \n",
" 17 | \n",
" 873.0 | \n",
" Louisiana | \n",
" 4651203 | \n",
"
\n",
" \n",
" 18 | \n",
" 21.0 | \n",
" Maine | \n",
" 1362280 | \n",
"
\n",
" \n",
" 19 | \n",
" 649.0 | \n",
" Maryland | \n",
" 6172679 | \n",
"
\n",
" \n",
" 20 | \n",
" 183.0 | \n",
" Massachusetts | \n",
" 7022220 | \n",
"
\n",
" \n",
" 21 | \n",
" 811.0 | \n",
" Michigan | \n",
" 10067664 | \n",
"
\n",
" \n",
" 22 | \n",
" 198.0 | \n",
" Minnesota | \n",
" 5707165 | \n",
"
\n",
" \n",
" 23 | \n",
" 576.0 | \n",
" Mississippi | \n",
" 2956870 | \n",
"
\n",
" \n",
" 24 | \n",
" 803.0 | \n",
" Missouri | \n",
" 6154481 | \n",
"
\n",
" \n",
" 25 | \n",
" 65.0 | \n",
" Montana | \n",
" 1086193 | \n",
"
\n",
" \n",
" 26 | \n",
" 76.0 | \n",
" Nebraska | \n",
" 1961455 | \n",
"
\n",
" \n",
" 27 | \n",
" 217.0 | \n",
" Nevada | \n",
" 3114071 | \n",
"
\n",
" \n",
" 28 | \n",
" 14.0 | \n",
" New Hampshire | \n",
" 1377848 | \n",
"
\n",
" \n",
" 29 | \n",
" 362.0 | \n",
" New Jersey | \n",
" 9279743 | \n",
"
\n",
" \n",
" 30 | \n",
" 216.0 | \n",
" New Mexico | \n",
" 2117566 | \n",
"
\n",
" \n",
" 31 | \n",
" 875.0 | \n",
" New York | \n",
" 20154933 | \n",
"
\n",
" \n",
" 32 | \n",
" 884.0 | \n",
" North Carolina | \n",
" 10457177 | \n",
"
\n",
" \n",
" 33 | \n",
" 30.0 | \n",
" North Dakota | \n",
" 778962 | \n",
"
\n",
" \n",
" 34 | \n",
" 1004.0 | \n",
" Ohio | \n",
" 11790587 | \n",
"
\n",
" \n",
" 35 | \n",
" 342.0 | \n",
" Oklahoma | \n",
" 3962031 | \n",
"
\n",
" \n",
" 36 | \n",
" 157.0 | \n",
" Oregon | \n",
" 4241544 | \n",
"
\n",
" \n",
" 37 | \n",
" 994.0 | \n",
" Pennsylvania | \n",
" 12989625 | \n",
"
\n",
" \n",
" 38 | \n",
" 29.0 | \n",
" Rhode Island | \n",
" 1096229 | \n",
"
\n",
" \n",
" 39 | \n",
" 622.0 | \n",
" South Carolina | \n",
" 5130729 | \n",
"
\n",
" \n",
" 40 | \n",
" 52.0 | \n",
" South Dakota | \n",
" 887099 | \n",
"
\n",
" \n",
" 41 | \n",
" 753.0 | \n",
" Tennessee | \n",
" 6920119 | \n",
"
\n",
" \n",
" 42 | \n",
" 2212.0 | \n",
" Texas | \n",
" 29217653 | \n",
"
\n",
" \n",
" 43 | \n",
" 95.0 | \n",
" Utah | \n",
" 3281684 | \n",
"
\n",
" \n",
" 44 | \n",
" 14.0 | \n",
" Vermont | \n",
" 642495 | \n",
"
\n",
" \n",
" 45 | \n",
" 531.0 | \n",
" Virginia | \n",
" 8632044 | \n",
"
\n",
" \n",
" 46 | \n",
" 322.0 | \n",
" Washington | \n",
" 7718785 | \n",
"
\n",
" \n",
" 47 | \n",
" 114.0 | \n",
" West Virginia | \n",
" 1789798 | \n",
"
\n",
" \n",
" 48 | \n",
" 334.0 | \n",
" Wisconsin | \n",
" 5892323 | \n",
"
\n",
" \n",
" 49 | \n",
" 25.0 | \n",
" Wyoming | \n",
" 577267 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" homicides state population\n",
"0 654.0 Alabama 5024803\n",
"1 55.0 Alaska 732441\n",
"2 525.0 Arizona 7177986\n",
"3 371.0 Arkansas 3012232\n",
"4 2368.0 California 39499738\n",
"5 336.0 Colorado 5784308\n",
"6 152.0 Connecticut 3600260\n",
"7 86.0 Delaware 991886\n",
"8 1530.0 Florida 21569932\n",
"9 1093.0 Georgia 10725800\n",
"10 46.0 Hawaii 1451911\n",
"11 42.0 Idaho 1847772\n",
"12 1353.0 Illinois 12785245\n",
"13 620.0 Indiana 6785644\n",
"14 106.0 Iowa 3188669\n",
"15 195.0 Kansas 2935880\n",
"16 404.0 Kentucky 4503958\n",
"17 873.0 Louisiana 4651203\n",
"18 21.0 Maine 1362280\n",
"19 649.0 Maryland 6172679\n",
"20 183.0 Massachusetts 7022220\n",
"21 811.0 Michigan 10067664\n",
"22 198.0 Minnesota 5707165\n",
"23 576.0 Mississippi 2956870\n",
"24 803.0 Missouri 6154481\n",
"25 65.0 Montana 1086193\n",
"26 76.0 Nebraska 1961455\n",
"27 217.0 Nevada 3114071\n",
"28 14.0 New Hampshire 1377848\n",
"29 362.0 New Jersey 9279743\n",
"30 216.0 New Mexico 2117566\n",
"31 875.0 New York 20154933\n",
"32 884.0 North Carolina 10457177\n",
"33 30.0 North Dakota 778962\n",
"34 1004.0 Ohio 11790587\n",
"35 342.0 Oklahoma 3962031\n",
"36 157.0 Oregon 4241544\n",
"37 994.0 Pennsylvania 12989625\n",
"38 29.0 Rhode Island 1096229\n",
"39 622.0 South Carolina 5130729\n",
"40 52.0 South Dakota 887099\n",
"41 753.0 Tennessee 6920119\n",
"42 2212.0 Texas 29217653\n",
"43 95.0 Utah 3281684\n",
"44 14.0 Vermont 642495\n",
"45 531.0 Virginia 8632044\n",
"46 322.0 Washington 7718785\n",
"47 114.0 West Virginia 1789798\n",
"48 334.0 Wisconsin 5892323\n",
"49 25.0 Wyoming 577267"
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"combined = combined.drop(columns=[\"NAME\"])\n",
"combined = combined.rename(columns={\"POPESTIMATE2020\": \"population\"})\n",
"combined"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now we're ready to conduct our analysis. Let's create a new column called 'homicide_rate' and it will be the number of homicides divided by the state population. Since this number is typically a small value, this is often reported as the number of homicides per 100,000 people, so we just multiply the result by 100,000."
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" homicides | \n",
" state | \n",
" population | \n",
" homicide_rate | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 654.0 | \n",
" Alabama | \n",
" 5024803 | \n",
" 13.015436 | \n",
"
\n",
" \n",
" 1 | \n",
" 55.0 | \n",
" Alaska | \n",
" 732441 | \n",
" 7.509137 | \n",
"
\n",
" \n",
" 2 | \n",
" 525.0 | \n",
" Arizona | \n",
" 7177986 | \n",
" 7.314029 | \n",
"
\n",
" \n",
" 3 | \n",
" 371.0 | \n",
" Arkansas | \n",
" 3012232 | \n",
" 12.316448 | \n",
"
\n",
" \n",
" 4 | \n",
" 2368.0 | \n",
" California | \n",
" 39499738 | \n",
" 5.994976 | \n",
"
\n",
" \n",
" 5 | \n",
" 336.0 | \n",
" Colorado | \n",
" 5784308 | \n",
" 5.808819 | \n",
"
\n",
" \n",
" 6 | \n",
" 152.0 | \n",
" Connecticut | \n",
" 3600260 | \n",
" 4.221917 | \n",
"
\n",
" \n",
" 7 | \n",
" 86.0 | \n",
" Delaware | \n",
" 991886 | \n",
" 8.670351 | \n",
"
\n",
" \n",
" 8 | \n",
" 1530.0 | \n",
" Florida | \n",
" 21569932 | \n",
" 7.093207 | \n",
"
\n",
" \n",
" 9 | \n",
" 1093.0 | \n",
" Georgia | \n",
" 10725800 | \n",
" 10.190382 | \n",
"
\n",
" \n",
" 10 | \n",
" 46.0 | \n",
" Hawaii | \n",
" 1451911 | \n",
" 3.168238 | \n",
"
\n",
" \n",
" 11 | \n",
" 42.0 | \n",
" Idaho | \n",
" 1847772 | \n",
" 2.273008 | \n",
"
\n",
" \n",
" 12 | \n",
" 1353.0 | \n",
" Illinois | \n",
" 12785245 | \n",
" 10.582511 | \n",
"
\n",
" \n",
" 13 | \n",
" 620.0 | \n",
" Indiana | \n",
" 6785644 | \n",
" 9.136937 | \n",
"
\n",
" \n",
" 14 | \n",
" 106.0 | \n",
" Iowa | \n",
" 3188669 | \n",
" 3.324271 | \n",
"
\n",
" \n",
" 15 | \n",
" 195.0 | \n",
" Kansas | \n",
" 2935880 | \n",
" 6.641961 | \n",
"
\n",
" \n",
" 16 | \n",
" 404.0 | \n",
" Kentucky | \n",
" 4503958 | \n",
" 8.969888 | \n",
"
\n",
" \n",
" 17 | \n",
" 873.0 | \n",
" Louisiana | \n",
" 4651203 | \n",
" 18.769338 | \n",
"
\n",
" \n",
" 18 | \n",
" 21.0 | \n",
" Maine | \n",
" 1362280 | \n",
" 1.541533 | \n",
"
\n",
" \n",
" 19 | \n",
" 649.0 | \n",
" Maryland | \n",
" 6172679 | \n",
" 10.514073 | \n",
"
\n",
" \n",
" 20 | \n",
" 183.0 | \n",
" Massachusetts | \n",
" 7022220 | \n",
" 2.606013 | \n",
"
\n",
" \n",
" 21 | \n",
" 811.0 | \n",
" Michigan | \n",
" 10067664 | \n",
" 8.055493 | \n",
"
\n",
" \n",
" 22 | \n",
" 198.0 | \n",
" Minnesota | \n",
" 5707165 | \n",
" 3.469323 | \n",
"
\n",
" \n",
" 23 | \n",
" 576.0 | \n",
" Mississippi | \n",
" 2956870 | \n",
" 19.480058 | \n",
"
\n",
" \n",
" 24 | \n",
" 803.0 | \n",
" Missouri | \n",
" 6154481 | \n",
" 13.047404 | \n",
"
\n",
" \n",
" 25 | \n",
" 65.0 | \n",
" Montana | \n",
" 1086193 | \n",
" 5.984204 | \n",
"
\n",
" \n",
" 26 | \n",
" 76.0 | \n",
" Nebraska | \n",
" 1961455 | \n",
" 3.874675 | \n",
"
\n",
" \n",
" 27 | \n",
" 217.0 | \n",
" Nevada | \n",
" 3114071 | \n",
" 6.968370 | \n",
"
\n",
" \n",
" 28 | \n",
" 14.0 | \n",
" New Hampshire | \n",
" 1377848 | \n",
" 1.016077 | \n",
"
\n",
" \n",
" 29 | \n",
" 362.0 | \n",
" New Jersey | \n",
" 9279743 | \n",
" 3.900970 | \n",
"
\n",
" \n",
" 30 | \n",
" 216.0 | \n",
" New Mexico | \n",
" 2117566 | \n",
" 10.200390 | \n",
"
\n",
" \n",
" 31 | \n",
" 875.0 | \n",
" New York | \n",
" 20154933 | \n",
" 4.341369 | \n",
"
\n",
" \n",
" 32 | \n",
" 884.0 | \n",
" North Carolina | \n",
" 10457177 | \n",
" 8.453524 | \n",
"
\n",
" \n",
" 33 | \n",
" 30.0 | \n",
" North Dakota | \n",
" 778962 | \n",
" 3.851279 | \n",
"
\n",
" \n",
" 34 | \n",
" 1004.0 | \n",
" Ohio | \n",
" 11790587 | \n",
" 8.515267 | \n",
"
\n",
" \n",
" 35 | \n",
" 342.0 | \n",
" Oklahoma | \n",
" 3962031 | \n",
" 8.631936 | \n",
"
\n",
" \n",
" 36 | \n",
" 157.0 | \n",
" Oregon | \n",
" 4241544 | \n",
" 3.701482 | \n",
"
\n",
" \n",
" 37 | \n",
" 994.0 | \n",
" Pennsylvania | \n",
" 12989625 | \n",
" 7.652261 | \n",
"
\n",
" \n",
" 38 | \n",
" 29.0 | \n",
" Rhode Island | \n",
" 1096229 | \n",
" 2.645433 | \n",
"
\n",
" \n",
" 39 | \n",
" 622.0 | \n",
" South Carolina | \n",
" 5130729 | \n",
" 12.123034 | \n",
"
\n",
" \n",
" 40 | \n",
" 52.0 | \n",
" South Dakota | \n",
" 887099 | \n",
" 5.861803 | \n",
"
\n",
" \n",
" 41 | \n",
" 753.0 | \n",
" Tennessee | \n",
" 6920119 | \n",
" 10.881316 | \n",
"
\n",
" \n",
" 42 | \n",
" 2212.0 | \n",
" Texas | \n",
" 29217653 | \n",
" 7.570766 | \n",
"
\n",
" \n",
" 43 | \n",
" 95.0 | \n",
" Utah | \n",
" 3281684 | \n",
" 2.894855 | \n",
"
\n",
" \n",
" 44 | \n",
" 14.0 | \n",
" Vermont | \n",
" 642495 | \n",
" 2.179005 | \n",
"
\n",
" \n",
" 45 | \n",
" 531.0 | \n",
" Virginia | \n",
" 8632044 | \n",
" 6.151498 | \n",
"
\n",
" \n",
" 46 | \n",
" 322.0 | \n",
" Washington | \n",
" 7718785 | \n",
" 4.171641 | \n",
"
\n",
" \n",
" 47 | \n",
" 114.0 | \n",
" West Virginia | \n",
" 1789798 | \n",
" 6.369434 | \n",
"
\n",
" \n",
" 48 | \n",
" 334.0 | \n",
" Wisconsin | \n",
" 5892323 | \n",
" 5.668393 | \n",
"
\n",
" \n",
" 49 | \n",
" 25.0 | \n",
" Wyoming | \n",
" 577267 | \n",
" 4.330752 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" homicides state population homicide_rate\n",
"0 654.0 Alabama 5024803 13.015436\n",
"1 55.0 Alaska 732441 7.509137\n",
"2 525.0 Arizona 7177986 7.314029\n",
"3 371.0 Arkansas 3012232 12.316448\n",
"4 2368.0 California 39499738 5.994976\n",
"5 336.0 Colorado 5784308 5.808819\n",
"6 152.0 Connecticut 3600260 4.221917\n",
"7 86.0 Delaware 991886 8.670351\n",
"8 1530.0 Florida 21569932 7.093207\n",
"9 1093.0 Georgia 10725800 10.190382\n",
"10 46.0 Hawaii 1451911 3.168238\n",
"11 42.0 Idaho 1847772 2.273008\n",
"12 1353.0 Illinois 12785245 10.582511\n",
"13 620.0 Indiana 6785644 9.136937\n",
"14 106.0 Iowa 3188669 3.324271\n",
"15 195.0 Kansas 2935880 6.641961\n",
"16 404.0 Kentucky 4503958 8.969888\n",
"17 873.0 Louisiana 4651203 18.769338\n",
"18 21.0 Maine 1362280 1.541533\n",
"19 649.0 Maryland 6172679 10.514073\n",
"20 183.0 Massachusetts 7022220 2.606013\n",
"21 811.0 Michigan 10067664 8.055493\n",
"22 198.0 Minnesota 5707165 3.469323\n",
"23 576.0 Mississippi 2956870 19.480058\n",
"24 803.0 Missouri 6154481 13.047404\n",
"25 65.0 Montana 1086193 5.984204\n",
"26 76.0 Nebraska 1961455 3.874675\n",
"27 217.0 Nevada 3114071 6.968370\n",
"28 14.0 New Hampshire 1377848 1.016077\n",
"29 362.0 New Jersey 9279743 3.900970\n",
"30 216.0 New Mexico 2117566 10.200390\n",
"31 875.0 New York 20154933 4.341369\n",
"32 884.0 North Carolina 10457177 8.453524\n",
"33 30.0 North Dakota 778962 3.851279\n",
"34 1004.0 Ohio 11790587 8.515267\n",
"35 342.0 Oklahoma 3962031 8.631936\n",
"36 157.0 Oregon 4241544 3.701482\n",
"37 994.0 Pennsylvania 12989625 7.652261\n",
"38 29.0 Rhode Island 1096229 2.645433\n",
"39 622.0 South Carolina 5130729 12.123034\n",
"40 52.0 South Dakota 887099 5.861803\n",
"41 753.0 Tennessee 6920119 10.881316\n",
"42 2212.0 Texas 29217653 7.570766\n",
"43 95.0 Utah 3281684 2.894855\n",
"44 14.0 Vermont 642495 2.179005\n",
"45 531.0 Virginia 8632044 6.151498\n",
"46 322.0 Washington 7718785 4.171641\n",
"47 114.0 West Virginia 1789798 6.369434\n",
"48 334.0 Wisconsin 5892323 5.668393\n",
"49 25.0 Wyoming 577267 4.330752"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"combined[\"homicide_rate\"] = combined[\"homicides\"] / combined[\"population\"] * 100000\n",
"combined"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Lastly, let's sort the data by 'homicide_rate' to see what the states are with the highest and lowest rates:"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" homicides | \n",
" state | \n",
" population | \n",
" homicide_rate | \n",
"
\n",
" \n",
" \n",
" \n",
" 28 | \n",
" 14.0 | \n",
" New Hampshire | \n",
" 1377848 | \n",
" 1.016077 | \n",
"
\n",
" \n",
" 18 | \n",
" 21.0 | \n",
" Maine | \n",
" 1362280 | \n",
" 1.541533 | \n",
"
\n",
" \n",
" 44 | \n",
" 14.0 | \n",
" Vermont | \n",
" 642495 | \n",
" 2.179005 | \n",
"
\n",
" \n",
" 11 | \n",
" 42.0 | \n",
" Idaho | \n",
" 1847772 | \n",
" 2.273008 | \n",
"
\n",
" \n",
" 20 | \n",
" 183.0 | \n",
" Massachusetts | \n",
" 7022220 | \n",
" 2.606013 | \n",
"
\n",
" \n",
" 38 | \n",
" 29.0 | \n",
" Rhode Island | \n",
" 1096229 | \n",
" 2.645433 | \n",
"
\n",
" \n",
" 43 | \n",
" 95.0 | \n",
" Utah | \n",
" 3281684 | \n",
" 2.894855 | \n",
"
\n",
" \n",
" 10 | \n",
" 46.0 | \n",
" Hawaii | \n",
" 1451911 | \n",
" 3.168238 | \n",
"
\n",
" \n",
" 14 | \n",
" 106.0 | \n",
" Iowa | \n",
" 3188669 | \n",
" 3.324271 | \n",
"
\n",
" \n",
" 22 | \n",
" 198.0 | \n",
" Minnesota | \n",
" 5707165 | \n",
" 3.469323 | \n",
"
\n",
" \n",
" 36 | \n",
" 157.0 | \n",
" Oregon | \n",
" 4241544 | \n",
" 3.701482 | \n",
"
\n",
" \n",
" 33 | \n",
" 30.0 | \n",
" North Dakota | \n",
" 778962 | \n",
" 3.851279 | \n",
"
\n",
" \n",
" 26 | \n",
" 76.0 | \n",
" Nebraska | \n",
" 1961455 | \n",
" 3.874675 | \n",
"
\n",
" \n",
" 29 | \n",
" 362.0 | \n",
" New Jersey | \n",
" 9279743 | \n",
" 3.900970 | \n",
"
\n",
" \n",
" 46 | \n",
" 322.0 | \n",
" Washington | \n",
" 7718785 | \n",
" 4.171641 | \n",
"
\n",
" \n",
" 6 | \n",
" 152.0 | \n",
" Connecticut | \n",
" 3600260 | \n",
" 4.221917 | \n",
"
\n",
" \n",
" 49 | \n",
" 25.0 | \n",
" Wyoming | \n",
" 577267 | \n",
" 4.330752 | \n",
"
\n",
" \n",
" 31 | \n",
" 875.0 | \n",
" New York | \n",
" 20154933 | \n",
" 4.341369 | \n",
"
\n",
" \n",
" 48 | \n",
" 334.0 | \n",
" Wisconsin | \n",
" 5892323 | \n",
" 5.668393 | \n",
"
\n",
" \n",
" 5 | \n",
" 336.0 | \n",
" Colorado | \n",
" 5784308 | \n",
" 5.808819 | \n",
"
\n",
" \n",
" 40 | \n",
" 52.0 | \n",
" South Dakota | \n",
" 887099 | \n",
" 5.861803 | \n",
"
\n",
" \n",
" 25 | \n",
" 65.0 | \n",
" Montana | \n",
" 1086193 | \n",
" 5.984204 | \n",
"
\n",
" \n",
" 4 | \n",
" 2368.0 | \n",
" California | \n",
" 39499738 | \n",
" 5.994976 | \n",
"
\n",
" \n",
" 45 | \n",
" 531.0 | \n",
" Virginia | \n",
" 8632044 | \n",
" 6.151498 | \n",
"
\n",
" \n",
" 47 | \n",
" 114.0 | \n",
" West Virginia | \n",
" 1789798 | \n",
" 6.369434 | \n",
"
\n",
" \n",
" 15 | \n",
" 195.0 | \n",
" Kansas | \n",
" 2935880 | \n",
" 6.641961 | \n",
"
\n",
" \n",
" 27 | \n",
" 217.0 | \n",
" Nevada | \n",
" 3114071 | \n",
" 6.968370 | \n",
"
\n",
" \n",
" 8 | \n",
" 1530.0 | \n",
" Florida | \n",
" 21569932 | \n",
" 7.093207 | \n",
"
\n",
" \n",
" 2 | \n",
" 525.0 | \n",
" Arizona | \n",
" 7177986 | \n",
" 7.314029 | \n",
"
\n",
" \n",
" 1 | \n",
" 55.0 | \n",
" Alaska | \n",
" 732441 | \n",
" 7.509137 | \n",
"
\n",
" \n",
" 42 | \n",
" 2212.0 | \n",
" Texas | \n",
" 29217653 | \n",
" 7.570766 | \n",
"
\n",
" \n",
" 37 | \n",
" 994.0 | \n",
" Pennsylvania | \n",
" 12989625 | \n",
" 7.652261 | \n",
"
\n",
" \n",
" 21 | \n",
" 811.0 | \n",
" Michigan | \n",
" 10067664 | \n",
" 8.055493 | \n",
"
\n",
" \n",
" 32 | \n",
" 884.0 | \n",
" North Carolina | \n",
" 10457177 | \n",
" 8.453524 | \n",
"
\n",
" \n",
" 34 | \n",
" 1004.0 | \n",
" Ohio | \n",
" 11790587 | \n",
" 8.515267 | \n",
"
\n",
" \n",
" 35 | \n",
" 342.0 | \n",
" Oklahoma | \n",
" 3962031 | \n",
" 8.631936 | \n",
"
\n",
" \n",
" 7 | \n",
" 86.0 | \n",
" Delaware | \n",
" 991886 | \n",
" 8.670351 | \n",
"
\n",
" \n",
" 16 | \n",
" 404.0 | \n",
" Kentucky | \n",
" 4503958 | \n",
" 8.969888 | \n",
"
\n",
" \n",
" 13 | \n",
" 620.0 | \n",
" Indiana | \n",
" 6785644 | \n",
" 9.136937 | \n",
"
\n",
" \n",
" 9 | \n",
" 1093.0 | \n",
" Georgia | \n",
" 10725800 | \n",
" 10.190382 | \n",
"
\n",
" \n",
" 30 | \n",
" 216.0 | \n",
" New Mexico | \n",
" 2117566 | \n",
" 10.200390 | \n",
"
\n",
" \n",
" 19 | \n",
" 649.0 | \n",
" Maryland | \n",
" 6172679 | \n",
" 10.514073 | \n",
"
\n",
" \n",
" 12 | \n",
" 1353.0 | \n",
" Illinois | \n",
" 12785245 | \n",
" 10.582511 | \n",
"
\n",
" \n",
" 41 | \n",
" 753.0 | \n",
" Tennessee | \n",
" 6920119 | \n",
" 10.881316 | \n",
"
\n",
" \n",
" 39 | \n",
" 622.0 | \n",
" South Carolina | \n",
" 5130729 | \n",
" 12.123034 | \n",
"
\n",
" \n",
" 3 | \n",
" 371.0 | \n",
" Arkansas | \n",
" 3012232 | \n",
" 12.316448 | \n",
"
\n",
" \n",
" 0 | \n",
" 654.0 | \n",
" Alabama | \n",
" 5024803 | \n",
" 13.015436 | \n",
"
\n",
" \n",
" 24 | \n",
" 803.0 | \n",
" Missouri | \n",
" 6154481 | \n",
" 13.047404 | \n",
"
\n",
" \n",
" 17 | \n",
" 873.0 | \n",
" Louisiana | \n",
" 4651203 | \n",
" 18.769338 | \n",
"
\n",
" \n",
" 23 | \n",
" 576.0 | \n",
" Mississippi | \n",
" 2956870 | \n",
" 19.480058 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" homicides state population homicide_rate\n",
"28 14.0 New Hampshire 1377848 1.016077\n",
"18 21.0 Maine 1362280 1.541533\n",
"44 14.0 Vermont 642495 2.179005\n",
"11 42.0 Idaho 1847772 2.273008\n",
"20 183.0 Massachusetts 7022220 2.606013\n",
"38 29.0 Rhode Island 1096229 2.645433\n",
"43 95.0 Utah 3281684 2.894855\n",
"10 46.0 Hawaii 1451911 3.168238\n",
"14 106.0 Iowa 3188669 3.324271\n",
"22 198.0 Minnesota 5707165 3.469323\n",
"36 157.0 Oregon 4241544 3.701482\n",
"33 30.0 North Dakota 778962 3.851279\n",
"26 76.0 Nebraska 1961455 3.874675\n",
"29 362.0 New Jersey 9279743 3.900970\n",
"46 322.0 Washington 7718785 4.171641\n",
"6 152.0 Connecticut 3600260 4.221917\n",
"49 25.0 Wyoming 577267 4.330752\n",
"31 875.0 New York 20154933 4.341369\n",
"48 334.0 Wisconsin 5892323 5.668393\n",
"5 336.0 Colorado 5784308 5.808819\n",
"40 52.0 South Dakota 887099 5.861803\n",
"25 65.0 Montana 1086193 5.984204\n",
"4 2368.0 California 39499738 5.994976\n",
"45 531.0 Virginia 8632044 6.151498\n",
"47 114.0 West Virginia 1789798 6.369434\n",
"15 195.0 Kansas 2935880 6.641961\n",
"27 217.0 Nevada 3114071 6.968370\n",
"8 1530.0 Florida 21569932 7.093207\n",
"2 525.0 Arizona 7177986 7.314029\n",
"1 55.0 Alaska 732441 7.509137\n",
"42 2212.0 Texas 29217653 7.570766\n",
"37 994.0 Pennsylvania 12989625 7.652261\n",
"21 811.0 Michigan 10067664 8.055493\n",
"32 884.0 North Carolina 10457177 8.453524\n",
"34 1004.0 Ohio 11790587 8.515267\n",
"35 342.0 Oklahoma 3962031 8.631936\n",
"7 86.0 Delaware 991886 8.670351\n",
"16 404.0 Kentucky 4503958 8.969888\n",
"13 620.0 Indiana 6785644 9.136937\n",
"9 1093.0 Georgia 10725800 10.190382\n",
"30 216.0 New Mexico 2117566 10.200390\n",
"19 649.0 Maryland 6172679 10.514073\n",
"12 1353.0 Illinois 12785245 10.582511\n",
"41 753.0 Tennessee 6920119 10.881316\n",
"39 622.0 South Carolina 5130729 12.123034\n",
"3 371.0 Arkansas 3012232 12.316448\n",
"0 654.0 Alabama 5024803 13.015436\n",
"24 803.0 Missouri 6154481 13.047404\n",
"17 873.0 Louisiana 4651203 18.769338\n",
"23 576.0 Mississippi 2956870 19.480058"
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"combined = combined.sort_values(by=\"homicide_rate\")\n",
"combined"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can see that the states of New Hampshire, Maine, Vermont, Idaho, and Massachusetts, each have homicide rates between and 1 and 2.6 people per 100,000. Arkansas, Alabama, Missouri, Louisiana, and Mississippi have far higher rates ranging from 12.3 to 19.5 people per 100,000. The homicide rate is nearly 20 times higher in Mississippi than it is in New Hampshire.\n",
"\n",
"All of the skills and tools shown here scale up to larger datasets as well. As we can see, data preparation was a significant part of the effort and being able to programmatically complete the process was essential. Data preparation requires careful design to take care of potential imperfections, omissions, or mismatches in the data. Once taken care of, the subsequent analysis can proceed for us to derive insights from our data."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Summary\n",
"\n",
"In this section, we saw that merging is a key tool for combining datasets and is able to accommodate many of the common types of merges that we would want to execute. Merging data allows us to integrate the information from two DataFrames logically and coherently to prepare it for further analysis and querying, but care must be taken in structuring the merge so that we don't lose data or introduce unwanted `NaN` values. Next, you'll get some practice merging data. Once we have our data combined, in an upcoming lesson, we'll explore how to group our data by its columns."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3.9.13 ('ds')",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.7"
},
"orig_nbformat": 4,
"vscode": {
"interpreter": {
"hash": "a8f7447a67778e188a9289d5afd8f46b343fa90d5cd10b8a7763899ed796f96a"
}
}
},
"nbformat": 4,
"nbformat_minor": 2
}