A simple script, doing some natural language processing, to inhibit git commits with bad messages from getting merged.
A Python3 script, easy to integrate with various CI machinery (e.g. see GitHub Actions example) that is meant to keep bad commit messages out of a project. It verifies whether the seven rules of a great Git commit message by Chris Beams, are being followed:
The more a project grows in contributors and size, the more important the quality of its commit messages becomes.
It is ultimately hard to quantify what makes a good commit message. However, we can easily determine automatically whether it adheres to most of the git commit message best practices. Having a script to automatically verify that, aside of other rules that you might already enforce (e.g. checking whether a task or a requirement is referenced) can save a lot of time in the long run as well as even avoid conflicts between the reviewer(s) and the reviewee.
After adopting this script, what you get is a set of best practices, agreed upon by the project and automatically applied. This way, everyone has to follow them or CI will not allow that commit in. Simple as that! :innocent:
Most of the rules are trivial to implement in code, except two of them, no. 5 and no. 7. Specifically, checking whether the commit subject begins with imperative mood is tricky due to limitations of the Natural Language Processing libraries being utilized. You can read more about the constraints here. Essentially, it boils down to the lack of many imperative sentences existing in the datasets used when training the relevant statistical models. Subsequently, the enforcement of this rule might produce some false positive and false negative errors.
On the other hand, verifying whether the commit body actually explains what and why instead of how is not (?) possible, due to the subjective nature of the problem. All in all, in most cases, this is all the reviewers would have to do themselves manually, to ensure the quality of a commit message.
You need to have Python3 installed and follow the steps bellow:
* Install TextBlob
* pip3 install --user textblob
* Install NLTK corpora
* python3 -m textblob.download_corpora
* Run the script to verify a commit message
* python3 bad_commit_message_blocker.py --message "Add a really cool feature"
* To define your own maximum character limits, call the script with the
appropriate arguments:
* --subject-limit
(defaults to 50
) to set the subject line limit. E.g.:
* python3 bad_commit_message_blocker.py --subject-limit 80 --message "Add a really cool feature"
* --body-limit
(defaults to 72
) to set the body line limit. E.g.:
* python3 bad_commit_message_blocker.py --body-limit 120 --message "Add a really cool feature"
Now you can use this script as part of your GitHub Actions CI pipeline.
An example configuration can be seen below:
```yaml name: Commit messages
on: [pull_request]
jobs:
check-commit-message:
if: github.event.pull_request.user.type != 'Bot' # a number of GitHub Apps that can send PRs don't have configurable commits
runs-on: ubuntu-20.04
steps:
- name: Verify commit messages follow best practices in CI
uses: platisd/[email protected]
with:
github_token: ${{ secrets.GITHUB_TOKEN }}
# Optionally set the subject character limit (default 50
)
subject_limit: 60
# Optionally set the body character limit (default 72
)
body_limit: 100
# Optionally set the remote branch name to merge (default master
)
remote_branch: dev
```
It would be nice to have example config files for using this script in:
Not sure if its better to have them as part of the repository as files (it might be a pollution) or to have them as part of the README
The current action implementation assumes that commits are always coming from the upstream repository but in reality, they don't, in the case of forks.
Context: https://github.com/cherrypy/cheroot/pull/401#issuecomment-1003351474.
This needs to be fixed here: https://github.com/platisd/bad-commit-message-blocker/blob/master/entrypoint.sh#L14.
Action items: * [ ] Check the GitHub Actions docs to figure out the variable that exposes this information * [ ] Use that to send a PR
I think it would be reasonable to refactor the GHA to use a better action type. Docker is limited to Ubuntu workers and has a certain amount of runtime overhead that I believe can be addressed with Composite GitHub Actions. I've promptly checked the current setup and I don't believe there's anything that is tightly coupled with the docker runtime and can't be substituted.
Ref: https://docs.github.com/en/actions/creating-actions/creating-a-composite-action
Adding file checker so it can be used with a git commit hook. Also adding example commit-msg to be added to .git/hooks
Software Engineer & Maker. Currently with a special interest in the fields of Embedded Systems, Internet of things and Robotics.
GitHub Repositorygit nlp nltk textblob commit-message best-practices travis-ci continuous-integration code-review