Adding new features to GitLab Workhorse
in reply to
GitLab Workhorse is a smart reverse proxy for GitLab. It handles “large” HTTP requests such as file downloads, file uploads, Git push/pull and Git archive downloads.
Workhorse itself is not a feature, but there are several features in GitLab that would not work efficiently without Workhorse.
At a first glance, it may look like Workhorse is just a pipeline for processing HTTP streams so that you can reduce the amount of logic in your Ruby on Rails controller.
Engineer embarking on the quest of offloading a feature to Workhorse often find that the endeavour is much higher than what originally anticipated. In part because of the new programming language (only a few engineers at GitLab are Go developers), in part because of the demanding requirements for Workhorse. Workhorse is stateless, memory and disk usage must be kept under tight control, and the request should not be slowed down in the process.
What is a “large” request?
If most of the time is spent moving bytes from one end to the other, then it’s a “large” request.
git pull, uploading or downloading an artifact are all good examples of large requests.
With the rise of cloud-native installations, Workhorse’s feature-set was extended to add object storage direct-upload, to get rid of the shared Network File System (NFS) drives.
You can watch the following presentation for more details on the history of Workhorse and the NFS removal.
Can I add a feature to Workhorse?
Large requests usually involves file uploads, so first of all please familiarise with the Uploads development documentation. It contains the most common use-cases for adding a new type of upload and may answer all of your questions.
What if I need to process incoming/outgoing requests?
We suggest to follow this route only if absolutely necessary and no other options are available.
Splitting a feature between the rails code-base and Workhorse is deliberately choosing to introduce technical debt. It adds complexity to the system and coupling between the two components.
The Ruby on Rails solution for this class of problems is asynchronous processing. So please think about why this feature can’t be implemented in Sidekiq.
Here follows a list of considerations that may help you answering that question:
- Sidekiq jobs are easier to write and review, they are written in ruby, and your job can be part of the same merge request introducing the new change
- We have better observability and scalability tools for Sidekiq jobs. We can scale the job processing machines/pods independently, as well as stop processing a specific queue
- Sidekiq has a reduced blast radius. Each Puma instance has only one Workhorse. A failure at workhorse level is more likely to impact the whole machine, while a failure at a Sidekiq level is constrained on the queue and could result in a broken or partially degraded feature.
- Workhorse can extract single files into a remotely stored zip archive without downloading the whole archive (see
- Workhorse can’t store files on disk or memory, every type of processing can only be executed while streaming the body. The only exception where we write files on disk are disk buffered uploads, but this may change in the future as this is preventing the split between workhorse and puma containers on cloud-native installations.
- During an outage, the
dev-escalationengineer will likely be more familiar with our ruby codebase. Finding and fixing a problem in Workhorse will be harder.
If you still think we should add a new feature to Workhorse, please open an issue explaining what you want to implement and why it can’t be implemented in our ruby code-base. Workhorse maintainers will be happy to help you assessing the situation.