🧩 Maintaining Code Quality with pre-commit¶
Some tips and tools to help keep research software reproducible, reliable and robust.
Update (December 2025)
This post has been updated to use uv instead of Hatch for project setup. If you haven't tried uv yet, you're missing out -- it's an incredibly fast Python package and project manager written in Rust by the Astral team (the same folks behind Ruff). It handles virtual environments, dependency management, and project scaffolding with remarkable speed.
In the realm of research software engineering, writing clean, maintainable, and reproducible code is essential. Researchers often develop software that is shared among teams, published at conferences or journals, or used to derive critical results in adjacent fields. Adhering to best practices ensures that your code is not only reliable but understandable and reproducible. This post will hopefully give some motivation with examples and showcase several technologies that can help make putting the best practises into practice easier.
The post will focus on using pre-commit for maintaining (or enforcing, depending how you look at it) code quality, while other posts that will follow will cover pytest for testing, and git for version control.
Why Best Practices Matter¶
- Reproducibility: Ensures that research findings can be consistently replicated.
- Collaboration: Makes it easier for team members to understand and contribute to the project.
- Maintenance: Reduces the technical debt and makes the code easier to maintain and extend.
- Reliability: Helps in catching bugs and issues early, ensuring robust software.
%%{init: {'flowchart': {'padding': 20, 'nodeSpacing': 30, 'rankSpacing': 40}}}%%
flowchart LR
subgraph "Cost to Fix"
A["During Writing<br/>$1"] --> B["At Commit<br/>$10"]
B --> C["In Code Review<br/>$100"]
C --> D["After Merge<br/>$1000"]
D --> E["In Production<br/>$10000"]
end
style A fill:#e8f5e9,stroke:#388e3c,color:#1b5e20
style B fill:#e8f5e9,stroke:#388e3c,color:#1b5e20
style C fill:#fff3e0,stroke:#f57c00,color:#e65100
style D fill:#ffebee,stroke:#c62828,color:#b71c1c
style E fill:#ffebee,stroke:#c62828,color:#b71c1c The cost of fixing defects increases exponentially as they move through the development pipeline. Pre-commit hooks catch issues at the cheapest point.
Code Quality with pre-commit¶
Maintaining code quality is critical, and the pre-commit framework helps automate this by running checks before code is committed. While some may seen code style or quality a non-issue, having a consistent style or checking for linting errors can relive the mental burden of the programmer significantly. It is also amazingly easy to automate so why not get tools in place so you don't need to even think about it.
While other frameworks are out there such as Husky1 and Overcommit2 pre-commit allows developers to define a set of checks (hooks) that run before a commit is made, ensuring code quality and consistency before it enters the repository. These hooks can perform tasks such as code formatting, linting, checking for secrets, or running tests. Originally created by Yelp in 2014, the goal was to provide a common interface for pre-commit hooks across different languages and projects, promoting best practices and code quality from the very start of the development process.
Getting Started with pre-commit¶
1. Installation:
The easiest way to install pre-commit is via uv:
$ uv add pre-commit --dev
Alternatively, you can use pip install pre-commit or your package manager of choice. Let's put together a minimal repo and install our first pre-commit checks.
Info
For this demo example I will use uv for project setup and dependency management. Feel free to use whichever tool you are familiar with, but I highly recommend giving uv a try!
$ uv init hello --package
Initialized project `hello` at `/tmp/hello`
$ cd hello && tree
.
├── pyproject.toml
├── README.md
└── src
└── hello
└── __init__.py
3 directories, 3 files
$ echo 'print("Hello, World!")' >> src/hello/main.py
2. Configuration:
When it is installed, one simply creates a .pre-commit-config.yml file in the root of the repo like so:
# .pre-commit-config.yml
repos:
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.0.1
hooks:
- id: trailing-whitespace
- id: end-of-file-fixer
- id: check-yaml
- id: check-added-large-files
- repo: https://github.com/psf/black
rev: 24.4.2
hooks:
- id: black
- repo: https://github.com/astral-sh/ruff-pre-commit
rev: v0.5.5
hooks:
- id: ruff
Don't worry, we will cover the hooks later on...
3. Install pre-commit Hooks:
$ pre-commit install
Note
This will only install the necessary hooks locally. It is important to stress that these do not take affect upstream even after a push. So it is encouraged in team development there are guidelines about setting up pre-commit.
Warning
pre-commit naturally requires there to be a version control system in place to work. Otherwise you see the following error:
$ pre-commit install
An error has occurred: FatalError: git failed. Is it installed, and are you
in a Git repository directory? Check the log at
/Users/tallam/.cache/pre-commit/pre-commit.log
$ pre-commit install
pre-commit installed at .git/hooks/pre-commit
- Run pre-commit:
You can manually run all hooks on all files:
$ pre-commit run --all-files
$ pre-commit run --all-files
[INFO] Initializing environment for https://github.com/astral-sh/ruff-pre-commit.
[INFO] Installing environment for https://github.com/psf/black.
[INFO] Once installed this environment will be reused.
[INFO] This may take a few minutes...
[INFO] Installing environment for https://github.com/astral-sh/ruff-pre-commit.
[INFO] Once installed this environment will be reused.
[INFO] This may take a few minutes...
Trim Trailing Whitespace.................................................Passed
Fix End of Files.........................................................Passed
Check Yaml...............................................................Passed
Check for added large files..............................................Passed
black....................................................................Passed
ruff.....................................................................Passed
You may have noticed above that pre-commit has some built in checkers that are very useful and have been highlighted below.
# .pre-commit-config.yml
repos:
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.0.1
hooks:
- id: trailing-whitespace
- id: end-of-file-fixer
- id: check-yaml
- id: check-added-large-files
- repo: https://github.com/psf/black
rev: 24.4.2
hooks:
- id: black
- repo: https://github.com/astral-sh/ruff-pre-commit
rev: v0.5.5
hooks:
- id: ruff
One simple example is trailing-whitespace which will fail on extra white space. Let's add some to the main.py file to see what happens.
$ echo " " >> src/hello/main.py
diff --git a/src/hello/main.py b/src/hello/main.py
index 7df869a..3c8ba12 100644
--- a/src/hello/main.py
+++ b/src/hello/main.py
@@ -1 +1,2 @@
print("Hello, World!")
+
$ git add . && git commit -m "whitespace test"
Trim Trailing Whitespace.................................................Failed
- hook id: trailing-whitespace
- exit code: 1
- files were modified by this hook
Fixing src/hello/main.py
Fix End of Files.........................................................Failed
- hook id: end-of-file-fixer
- exit code: 1
- files were modified by this hook
Fixing src/hello/main.py
Check Yaml...........................................(no files to check)Skipped
Check for added large files..............................................Passed
black....................................................................Passed
ruff.....................................................................Passed
Fail. But pre-commit can automatically apply the fix for you. So when we have another look we see it has already been removed:
$ git status
On branch master
Changes to be committed:
(use "git restore --staged <file>..." to unstage)
modified: src/hello/main.py
Changes not staged for commit:
(use "git add <file>..." to update what will be committed)
(use "git restore <file>..." to discard changes in working directory)
modified: src/hello/main.py
# git diff
diff --git a/src/hello/main.py b/src/hello/main.py
index 3c8ba12..7df869a 100644
--- a/src/hello/main.py
+++ b/src/hello/main.py
@@ -1,2 +1 @@
print("Hello, World!")
-
# git diff --staged
diff --git a/src/hello/main.py b/src/hello/main.py
index 7df869a..3c8ba12 100644
--- a/src/hello/main.py
+++ b/src/hello/main.py
@@ -1 +1,2 @@
print("Hello, World!")
+
This may seem annoying, "it's only whitespace...🙄", but as you can see above, having the slightest change to what is expected can cause unnecessary diffs in the git logs. This means it makes it that much harder for your team members trying to get their head around complex code changes when there are random changes that do not change the logic of the code and just becomes a distraction. And trust me, when combing through 1000's of lines of code looking for a bug, the less distractions and diff content the better!
As programmers we spend way more time reading code that writing it and so enforcing code quality not only applies to silly examples like whitespace, but even the consistency of indentation, or style of if;else blocks to name a few. I'd encourage at this point to look at the built in hooks found here: https://pre-commit.com/hooks.html
All can make a difference and having a strict enforcement of style will make reading code far easier.
Some particularly useful ones are black: the "opinionated python formatter", ruff: a blazingly fast linter for python and mypy for pseudo-type checking of python code.
Custom Hooks¶
The awesome thing about pre-commit is you can also create custom hooks to enforce specific project requirements.
# .pre-commit-config.yaml
repos:
- repo: local
hooks:
- id: custom-check
name: Custom Check
entry: ./scripts/custom-check.sh
language: script
files: \.py$
# ./scripts/custom-check.sh
#!/bin/bash
# Custom script for checking something specific
echo "Running custom check"
Popular Hooks Reference¶
Beyond the basics, there's a rich ecosystem of hooks available. Here's a curated selection organised by language and purpose:
Python Hooks¶
| Hook | Purpose | Speed | Auto-fix |
|---|---|---|---|
black | Opinionated code formatter | Fast | Yes |
ruff | Extremely fast linter (replaces flake8, isort, etc.) | Very Fast | Partial |
ruff-format | Ruff's formatter (Black-compatible) | Very Fast | Yes |
mypy | Static type checking | Slow | No |
bandit | Security linter | Fast | No |
isort | Import sorting | Fast | Yes |
# Modern Python stack (recommended)
- repo: https://github.com/astral-sh/ruff-pre-commit
rev: v0.3.4
hooks:
- id: ruff
args: [--fix, --exit-non-zero-on-fix]
- id: ruff-format
# Type checking
- repo: https://github.com/pre-commit/mirrors-mypy
rev: v1.9.0
hooks:
- id: mypy
additional_dependencies:
- types-requests
- types-PyYAML
Ruff: The Modern Choice
Ruff has largely replaced the combination of Black + isort + Flake8 for many projects. Written in Rust, it's 10-100x faster than traditional Python linters and can replace multiple tools with a single configuration.
Rust Hooks¶
| Hook | Purpose | Auto-fix |
|---|---|---|
cargo-check | Type checking and compilation | No |
rustfmt | Official Rust formatter | Yes |
clippy | Linting with helpful suggestions | Partial |
- repo: local
hooks:
- id: cargo-fmt
name: cargo fmt
entry: cargo fmt --all -- --check
language: system
types: [rust]
pass_filenames: false
- id: cargo-clippy
name: cargo clippy
entry: cargo clippy --all-targets --all-features -- -D warnings
language: system
types: [rust]
pass_filenames: false
JavaScript/TypeScript Hooks¶
| Hook | Purpose | Auto-fix |
|---|---|---|
prettier | Code formatter | Yes |
eslint | Linting | Partial |
tsc | TypeScript type checking | No |
- repo: https://github.com/pre-commit/mirrors-prettier
rev: v3.1.0
hooks:
- id: prettier
types_or: [javascript, jsx, ts, tsx, json, yaml, markdown]
- repo: https://github.com/pre-commit/mirrors-eslint
rev: v8.56.0
hooks:
- id: eslint
files: \.[jt]sx?$
types: [file]
additional_dependencies:
- eslint@8.56.0
- "@typescript-eslint/eslint-plugin@7.0.0"
- "@typescript-eslint/parser@7.0.0"
Infrastructure Hooks¶
| Hook | Purpose | Auto-fix |
|---|---|---|
terraform-fmt | Terraform formatting | Yes |
hadolint | Dockerfile linting | No |
shellcheck | Shell script analysis | No |
shfmt | Shell script formatting | Yes |
- repo: https://github.com/shellcheck-py/shellcheck-py
rev: v0.9.0.6
hooks:
- id: shellcheck
- repo: https://github.com/scop/pre-commit-shfmt
rev: v3.8.0-1
hooks:
- id: shfmt
Security-Focused Hooks¶
Security deserves special attention. These hooks help prevent sensitive data from being committed and identify potential vulnerabilities.
Secrets in Git History
Once a secret is committed, it exists in git history forever -- even after deletion. Prevention is far easier than remediation. Use these hooks to catch secrets before they're committed.
Preventing Secret Leaks¶
# Built-in secret detection
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.5.0
hooks:
- id: detect-private-key
- id: detect-aws-credentials
# Comprehensive secret scanning with Gitleaks
- repo: https://github.com/gitleaks/gitleaks
rev: v8.18.2
hooks:
- id: gitleaks
Security Linting¶
# Python security with Bandit
- repo: https://github.com/PyCQA/bandit
rev: 1.7.8
hooks:
- id: bandit
args: [-ll, -ii] # Medium severity, medium confidence
# SAST for multiple languages with Semgrep
- repo: https://github.com/returntocorp/semgrep
rev: v1.64.0
hooks:
- id: semgrep
args: ["--config", "auto", "--error"]
Performance Considerations for Large Repositories¶
When working with large repositories, pre-commit hooks can become slow. Here are strategies to maintain performance:
1. Limit Hook Scope¶
Only run hooks on files that have changed:
repos:
- repo: https://github.com/psf/black
rev: 23.7.0
hooks:
- id: black
# Only run on staged files
stages: [commit]
# Limit to specific file types
types: [python]
2. Use Parallel Execution¶
Pre-commit runs hooks in parallel by default, but you can optimize further:
# Increase parallelism (default is number of CPUs)
$ pre-commit run --all-files --hook-stage push --verbose
3. Skip Expensive Hooks Locally¶
For computationally expensive checks, consider running them only in CI:
repos:
- repo: local
hooks:
- id: expensive-test
name: Expensive Test Suite
entry: pytest tests/integration
language: system
pass_filenames: false
# Only run during CI, not locally
stages: [manual]
Then in CI:
$ pre-commit run --hook-stage manual
4. Cache Dependencies¶
Pre-commit caches hook environments by default in ~/.cache/pre-commit. For CI systems, ensure this directory is cached between runs. If you're using uv with the astral-sh/setup-uv action, caching is built-in:
# Example for GitHub Actions with uv (recommended)
- name: Install uv
uses: astral-sh/setup-uv@v6
with:
enable-cache: true
cache-dependency-glob: "uv.lock"
# Also cache pre-commit environments
- uses: actions/cache@v4
with:
path: ~/.cache/pre-commit
key: pre-commit-${{ hashFiles('.pre-commit-config.yaml') }}
Integration with CI/CD Pipelines¶
Pre-commit hooks should complement, not replace, your CI/CD pipeline. Here's how to integrate them effectively:
GitHub Actions Example¶
name: Pre-commit Checks
on: [push, pull_request]
jobs:
pre-commit:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Install uv
uses: astral-sh/setup-uv@v6
with:
enable-cache: true
- name: Set up Python
run: uv python install
- name: Install pre-commit
run: uv tool install pre-commit
- name: Run pre-commit
run: pre-commit run --all-files --show-diff-on-failure
GitLab CI Example¶
pre-commit:
image: python:3.12
variables:
UV_CACHE_DIR: .uv-cache
before_script:
- curl -LsSf https://astral.sh/uv/install.sh | sh
- source $HOME/.local/bin/env
- uv tool install pre-commit
script:
- pre-commit run --all-files --show-diff-on-failure
cache:
paths:
- .uv-cache
- ~/.cache/pre-commit
Best Practises for CI Integration¶
- Run on all files in CI: Whilst local hooks run only on changed files, CI should validate the entire codebase
- Fail fast: Configure pre-commit to fail on first error in CI to save time
- Show diffs: Use
--show-diff-on-failureto help developers understand what needs fixing - Version lock: Pin pre-commit hook versions to ensure consistency across environments
Common Troubleshooting¶
Hook Fails But Files Aren't Modified¶
Problem: A hook reports failures but doesn't show what's wrong.
Solution: Use verbose mode to see detailed output:
$ pre-commit run --all-files --verbose
Hooks Run Too Slowly¶
Problem: Pre-commit takes a long time before each commit.
Solution:
- Check which hooks are slow:
$ time pre-commit run --all-files
- Consider running expensive hooks only in CI (see Performance Considerations above)
Hook Environment Issues¶
Problem: A hook works in CI but fails locally or vice versa.
Solution:
- Ensure hook versions match between
.pre-commit-config.yamland CI - Clear and rebuild hook environments:
$ pre-commit clean
$ pre-commit install --install-hooks
Skipping Hooks in Emergencies¶
Problem: Need to commit urgently but hooks are failing.
Solution: Use --no-verify flag, but use sparingly and fix issues immediately after:
$ git commit --no-verify -m "Emergency fix"
Warning: Bypassing hooks should be exceptional and requires discipline to fix issues promptly. Consider whether the "emergency" truly justifies skipping quality checks.
Files Not Being Checked¶
Problem: Pre-commit skips certain files.
Solution: Check your configuration:
- Verify file patterns match:
files: \.py$ # Only matches .py files
- Check exclude patterns:
exclude: ^(docs/|tests/fixtures/)
- Ensure files are tracked by git (pre-commit only runs on tracked files)
Conclusion¶
Hopefully this has given you some motivation and also pointed you in the direction where you can learn more about enforcing code quality in your projects. Always strive to automate everything and if you can automate code quality checks, why not?!
Related reading: For more on git workflows and best practises, see my posts on Learn you some git for great-good and profit and Mirroring Public Repositories on Github, Privately.
Further Resources¶
- Pre-commit Documentation - Official docs and hook directory
- Pre-commit Hooks Repository - Standard hooks collection
- Ruff - Fast Python linter and formatter
- Gitleaks - Secret detection tool
Happy coding! And be forceful with style