Practical Data Science in Python#

Welcome to Practical Data Science! This is the beginning of a textbook based on content developed, in significant part, for the Duke MIDS Practical Data Science course IDS 720 by Kyle Bradbury and Nick Eubank.

We are in the process of developing this content into both a Coursera Python Data Science Foundations Specialization on Coursera (in combination with Drew Hilton and Genevieve Lipp) and a stand-alone textbook. At the same time, I will also continue to use this site for my IDS 720 class, so if you’re not in the class, feel free to jump ahead to whatever interests you!

Duke MIDS Practical Data Science IDS 720#

Interested in enrolling in IDS 720 but not a MIDS student? Read this!

Data Science is an intrinsically applied field, and yet all too often students are taught the advanced math and statistics behind data science tools, but are left to fend for themselves when it comes to learning the tools we use to do data science on a day-to-day basis or how to manage actual projects. This course is designed to fill that gap.

Practical Data Science is a flipped-classroom, exercise and project-focused course. It is designed to give students practical experience manipulating and analyzing manipulating real (often messy, error ridden, and poorly documented) data using the full range of bread-and-butter Python data science tools (like the command line, git, python (especially numpy and pandas), jupyter notebooks, and more). By the end of the course, students will be able to:

  • Manipulate and analyze data in any format, including cleaning, merging, and summarizing all standard tabular formats and levels of cleanliness, as well as large datasets and GIS data,

  • Identify and resolve data issues using defensive programming practices,

  • Setup and manage a data science programming environment on their own computers, including installing Python, managing packages with pip and conda, setting PATH variables, and working with VS Code,

  • Collaborate with colleagues effectively using git and github,

  • Plan and execute a full data science project from planning data manipulations through analysis and presentation of findings.

Syllabus#

The full syllabus for this course can be downloaded here. Please note that this syllabus is subject to change.

Class Schedule#

The (tentative) Class Schedule can be found here

Questions or comments?#

Please let me know! All source files (and underlying jupyter notebooks) for this site can be found on github, and you can raise issues there by creating a new issue, or by emailing me at nick@nickeubank.com*.