Python for Data Analysis

Overview

Welcome to our three-day Python training intensive! This runs twice a year and the next intensive will be in early February.

By the end of the three days, you’ll have learnt the Python skills to manipulate, visualise and present data. We’ll spend roughly half the time learning content, and half the time working on a project in groups.

As we set up, there’s a few things to do, if you haven’t already

Software

We are going to use Positron for writing and running Python. This is a friendly interactive development environment (IDE) aimed at researchers.

Once you download and install Positron, launch it on your computer.

Using a non-Positron IDE

TL;DR: If you’re not using Positron, please download and install Quarto manually.

You’re more than welcome to use a different IDE than Positron. Some popular IDEs include

VS Code
Spyder
Pycharm
IDLE

We’ll be using the rendering and publishing tool Quarto from day 2. While Positron is shipped with Quarto, most other IDEs are not.

If you’re not using Positron, please download and install Quarto or chat to a trainer.

Google Colab

If you aren’t able to install Python and a suitable IDE on your device (e.g. if you do not have permission) then we can find an online alternative for you, likely in the form of Google Colab. Let us know and we’ll help you get set up!

Creating a working directory

Positron, like VS Code, let’s you choose a folder to work from. This is convenient for, e.g., filepaths.

We recommend that you create a dedicated folder for the intensive’s training. Once you’ve created one, you should open it in Positron with

File > Open folder... > Click the folder > Select folder

You should find that positron has ‘restarted’ and placed you inside the folder.

Workshops

Over these three days we’ll cover six sessions of content:

Session	Description
The Fundamentals	The basics of Python. Variables, functions and modules.
Data processing	Importing, manipulating and analysing data with `pandas`
Visualisation	Creating visualisations of our data with `seaborn`, `matplotlib` and `plotly`
Sharing and Publishing	Using GitHub for sharing and version control, as well as quarto for publishing dashboards and websites.
Statistics	Descriptive and inferential statistics, with some regressions and hypothesis testing, using `scipy.stats` and `statsmodels`
Programming Essentials	Python tools everyone should know. Conditionals, loops, functions and importing scripts.

These content sessions are pretty packed, and we won’t have too much time to deviate. That’s why we’ll also have five project sessions - see The Project for details. You’re welcome to ask lengthier questions and play around there!