Plotting Exercises, Part 1#
At this location, you will find 13 datasets, all with similar structure. Your goal will be to write code that loops over these datasets, calculates and prints out summary statistics, then later plots scatter plots.
Exercise 1#
Load the first dataset and get a feel for the data.
Note that the file being downloaded is not actually a CSV file. It is tab-delimited, meaning that within each row, columns are separated by tabs rather than commas. We communicate this to pandas with the delimiter="\t" option ("\t" is how we write a tab, as we will discuss in future lessons).
Exercise 2#
In order to get a better sense of what these datasets look like, write a loop that iterates over each example dataset (numbered 1 to 13) and print out the mean and standard deviation for x_var and y_var and their correlation for each dataset.
For example, the first iteration of this loop might return something like:
Example Dataset 1:
Mean x: 23.12321978429576,
Mean y: 98.23980921730972,
Std Dev x: 21.2389710287,
Std Dev y: 32.2389081209832,
Correlation: 0.73892819281
(Though you shouldn’t get those specific values. You might get values that are quite similar across datasets.)
Hint: When writing this type of code, it is often best to start by writing code to do what you want for the first iteration of the loop. Or, as Drew and Genevieve would say, WORK ONE CASE BY HAND! Once you have code that works for the first example dataset, then write the full loop around it.
Exercise 3#
Based only on these results, discuss what might you conclude about these example datasets with your partner. Write down your thoughts.
Execise 4#
Write a loop that iterates over these example datasets and plots a simple scatter plot of each dataset with the x variable on the x-axis and the y variable on the y-axis.
Exercise 5#
Review you plots. How does your impression of how these datasets differ from what you wrote down in Exercise 3?
Economic Development and… Your Choice!#
Exercise 6#
Load the World Development Indicator data here
Rather than picking a single year, pick a single country and look at how GDP per capita and one of the other variables in that dataset have evolved together over time. For now, plot the two series them as two separate plots. Be sure your figure includes a title and x and y-axis labels in plain English (not variable names).
Make any adjustments to the functional forms of your variables and/or axes needed to make the figure legible.
Exercise 7#
Let’s put the two series together so that they share a common x-axis of (so both series are plotted in the same plot, and the y-axes is time. Obviously your different series will need different y-axes).
Use your detective skills (and some guess and check work) to figure out how to get it to work!
Exercise 8#
LLMs are great for plotting, and they’re really good at matplotlib (it’s been around forever, so there’s lots of training data). Using Github Copilot Chat in VS Code, made the following changes to your plot:
Add a note to the bottom of the plot in the bottom left that reports the source for your data.
Change the x-axes so that it prints out the year every 5 years (1990, 1995, 2000, etc.).
Add a red, dashed, vertical line at the year 2000.
Add text that is rotated 90 degrees, in red, alongside that vertical line in the middle of the plot that says “New Millenium.”