Python for Data Analysis
Overview
Welcome to our three-day Python training intensive! This runs twice a year and the next intensive will be in early February.
By the end of the three days, you’ll have learnt the Python skills to manipulate, visualise and present data. We’ll spend roughly half the time learning content, and half the time working on a project in groups.
As we set up, there’s a few things to do, if you haven’t already
- Install the software
- Introduce yourself to your table
- Join our Teams channel
- Register your attendance
Software
We are going to use Positron for writing and running Python. This is a friendly interactive development environment (IDE) aimed at researchers.
Once you download and install Positron, launch it on your computer.
TL;DR: If you’re not using Positron, please download and install Quarto manually.
You’re more than welcome to use a different IDE than Positron. Some popular IDEs include
- VS Code
- Spyder
- Pycharm
- IDLE
We’ll be using the rendering and publishing tool Quarto from day 2. While Positron is shipped with Quarto, most other IDEs are not.
If you’re not using Positron, please download and install Quarto or chat to a trainer.
Google Colab
If you aren’t able to install Python and a suitable IDE on your device (e.g. if you do not have permission) then we can find an online alternative for you, likely in the form of Google Colab. Let us know and we’ll help you get set up!
Creating a working directory
Positron, like VS Code, let’s you choose a folder to work from. This is convenient for, e.g., filepaths.
We recommend that you create a dedicated folder for the intensive’s training. Once you’ve created one, you should open it in Positron with
File > Open folder... > Click the folder > Select folder
You should find that positron has ‘restarted’ and placed you inside the folder.
Workshops
Over these three days we’ll cover six sessions of content:
| Session | Description |
|---|---|
| The Fundamentals | The basics of Python. Variables, functions and modules. |
| Data processing | Importing, manipulating and analysing data with pandas |
| Visualisation | Creating visualisations of our data with seaborn, matplotlib and plotly |
| Sharing and Publishing | Using GitHub for sharing and version control, as well as quarto for publishing dashboards and websites. |
| Statistics | Descriptive and inferential statistics, with some regressions and hypothesis testing, using scipy.stats and statsmodels |
| Programming Essentials | Python tools everyone should know. Conditionals, loops, functions and importing scripts. |
These content sessions are pretty packed, and we won’t have too much time to deviate. That’s why we’ll also have five project sessions - see The Project for details. You’re welcome to ask lengthier questions and play around there!