Understanding and Managing Python Packages

Understanding and Managing Python Packages#

We have been using a Python package extensively (numpy) and have mentioned that packages are important since if a well-tested package exists that solves your problem, you should start using that before trying to develop your own. Two issues arise when talking about Python packages: (1) what is a package (and how does it differ from a module or script)? and (2) how do we manage packages in Python.

Packages and modules and scripts, oh my!#

By this point, you may have a number of terms swirling in your head - words like scripts, modules, and packages. Let’s work on refreshing ourselves on the meaning of these terms.

  1. Script. When you start coding you may run a few commands and realize that you’d be better off jotting them down and running them collectively. This may contain function or variable definitions or classes (with the corresponding methods or properties, the object-oriented analog to functions and variables). This is typically a script: all this in one file (.py or .ipynb).

  2. Module. As your script gets long enough or complicated enough, you may want to start splitting up your code and organizing it into separate files with separate variable, function, or class definitions. A module is such a file that contains definitions that can be imported into other modules or scripts for reuse.

  3. Packages. Packages are a way of organizing and giving structure to your modules and the module namespace. For example, the package numpy contains the module random, so if you’re looking to use the function rand, you call it using package.module.function() notation, which in this case would be numpy.random.rand(). Packages are typically a folder that contains multiple modules (.py files).

Python packages have to be installed and managed from outside of Python on the command line. There are two major tools for installing and managing packages in Python: conda and pip.

conda#

Conda is the package management tool created by Anaconda, and is usually the place to start when you want to install something.

To install a package using conda, go to your command line and type conda install [name of package]. i.e. if you want to install pandas, you would type conda install pandas. Conda will then evaluate what it needs to install to give you pandas. This includes things like also installing packages that pandas relies on, or updating existing software you have installed to make it compatible with pandas. It will form a plan, then ask you if you want it to execute. It won’t do anything till you say “yes”.

Conda can also be used to install a specific version of packages (e.g. conda install scipy=0.15.0), remove packages (conda remove [package]), and update packages (conda update [package name]). You can see all software you have installed with conda list.

Environments: Conda also supports what are called environments. An environment is like an isolated installation of Python living in a little bubble on your computer where you can experiment with packages without breaking your “working” installation, or have installations of different versions of python, and more. We won’t get into those here, but you can read up if you’d like here.

pip#

pip is the older python package manager and only manages Python packages. There are still some packages you can install with pip you can’t install with conda. We recommend you start with conda. pip has a similar syntax to conda: just run pip install [package name] from the command line.

Recap#

  • Scripts, modules, and packages each play an important role in organizing Python projects where packages provide a collection of modules and modules are a collection of definitions (functions, classes, etc.)

  • Modules can be imported, but the packages containing those modules need to be installed before use

  • We recommend installing Python packages using the conda package manager (or pip if unavailable using conda)