Creating Python Project in the Year 2024
Working with a data-heavy product means I often need to quickly run experiments, whether to validate a new feature or to run some load on production-grade infrastructure.
Wide Angle Analytics is backed exclusively by type-safe Scala. We chose Scala to move fast with the assurance of a great language and ecosystem to deliver correct and efficient solutions.
However, Scala can feel sluggish when attempting to deal with quick experiments.
Scala Alternative
Luckily, there is Perl... just kidding. I love Perl. Python, we are talking about Python. You can't talk about data science, data engineering, or even AI without mentioning Python.
This super expressive, albeit a tad wonky language, convinced hordes of C-style syntax enthusiasts that yes, we can trust braceless code. Heck, even Scala 3 supports braceless code style these days.
Ok, so Python. I would use Python to write experiments. This is often code that is not necessarily throwaway but will not make it into production at Wide Angle. So, tests are less important.
Python Developers
From what I experienced in my career, there are three types of Python developers. I am sure there are more variations, but we are talking about my anecdotal experience. If you don't agree, please take it to Hacker News and rage there.
Script Kiddy
The first type of developer is someone who uses Python as a glorified Bash. Hack and slash but gets the job done. All the dependencies live in the global installation, there is just one file, the script.py, and often, the script has one __main__
function and lots of if/else blocks.
Package Developer
Unlike the script kiddy, this kind of developer uses a virtual environment, defines dependencies, and maybe even pushes the package to git.
A few files, core dependencies, and Bob's your uncle, you got yourself a decent sandbox to run code.
Application Developer
Lastly, you have your serious Application Developers. All you Django aficionados, yes you! This is where I am out of my depth. Personally, I have never delivered a substantial Python project. I saw large codebases and they were intimidating. Python was never my cup of tea for bigger systems.
If I wanted to feel elitist, I would make a snarky remark about how Python is less safe than Scala, or even gasp, Java. But the truth is, Scala was always more familiar and felt easier due to preexisting knowledge.
So, no fault to Python itself, I can't say what a real Python application developer is like.
Ad-Hoc Scripting in Python
I, as many readers who reached this far, started as your typical Script Kiddy. Occasionally nuking local Linux installations by doing some horrible seppuku with system-wide Python installations.
With time, I grew the patience and practice to always build my application package and use that, instead of a hodgepodge of individual scripts.
Besides not destroying your local operating system, here are some benefits to this approach:
- You have full control over your development Python version and the packages it uses.
- When necessary, you can pull external packages, in a specific version, without affecting other projects.
- And finally, you end up with a repeatable build package, you can drop into Git and share with colleagues or the community.
Building a Python Package in 2024
Ok, you are convinced, you go to Google or ChatGPT and ask for instructions on how to create your new Python project/package. Chances are you will get some outdated or completely broken (🙄 AI) code snippets that you will paste into your code editor and end up being thoroughly disappointed.
So, if you are reading this guide today, this is how to do it today. Like everything, the information shared here will get outdated soon. You have been warned.
The Prerequisites
- Python 3
- Pip
Create your new project:
/project
/README.md
Next, create and activate your virtual environment:
$ python3 -m venv .venv
$ source ./venv/bin/activate
With that, you are in your sandbox. Whatever you install via pip will be localized to your project only.
Core Project Structure
Now, it is time to lay the foundations for our project structure:
/analyzer
/README.md
/.venv
/pyproject.toml
/setup.py
/runner
/app.py
The pyproject.toml
is a package configuration1 file. Here we will define a build tool, some basic project information, etc.
We are using a boilerplate setup.py
for legacy sake only. It is a very small file:
from setuptools import setup
setup()
And lastly, our code. That lives in analyzer/runner/app.py
and of course in many more files that we will later reference.
With that said, let's create our TOML build file:
[build-system]
requires = ["setuptools"]
build-backend = "setuptools.build_meta"
[project]
name = "waa-analyzer"
authors = [
{name = "Jarek Rozanski", email = "jarek@wideangle.co"},
]
description = "Data experiment for Wide Angle Analytics"
readme = "README.md"
requires-python = ">=3.12"
keywords = ["web-analytics", "analytics"]
dependencies = [
"elasticsearch",
]
version = "0.0.1"
[project.scripts]
analyzer = "runner.app:run"
[tool.setuptools]
packages = ["runner"]
The above TOML file2 defines a package with the following features:
- The package/project is called
analyzer
. - When you run the
analyzer
script, it will trigger the functionrun
defined in theapp.py
file, in therunner
sub-package. - The runtime depends on
elasticsearch
as a dependency.
Pretty neat.
Sample runner application code:
def run():
print("Hello, World")
Tip
If you plan on pushing your code to a repository, make sure to create a .gitignore
file and exclude .venv
from tracked files. You don't want these files in your source control.
Build It
First, build it in development mode, so it is easier to test and change:
python3 -m pip install -e .
And assuming all worked out...
Run It
$ analyzer
Hello, World
The above works as we defined the call to the function run
in runner.app
as a script. That script was made available in our current path, in the virtual environment. If you restart your shell, you will need to reactivate the previously defined virtual environment.
That's It
And that's how you build a quick Python project, with dependencies and repeatable builds, in 2024.
Go and learn some Python for greater good 😀
Try Wide Angle Analytics!