Smile news

How to configure a Docker Hub proxy with Harbor?

  • Date de l’événement Jul. 29 2023
  • Temps de lecture min.

Today, containerization is used on a daily basis. Most containers are based on images coming from a registry, usually the  Docker Hub registry : “the world’s leading service for finding and sharing container images”. Since November 20, 2020, when used anonymously, the Docker Hub registry has a pull limit of 100 pulls per 6 hours per IP address (200 for authenticated users, up to 5,000/day for paid subscriptions). When developers in your company, your CI runners and/or your Kubernetes cluster have the same public IP address, this limit can be reached very fast as every pull - triggered manually, by a CI build or a deployment - will be subtracted from that pull limit.
When the limit is reached, the consequence is that following pulls will be rejected, hence blocking developers, CI runners builds or Kubernetes deployments. You should, in that case, notice one of the following error messages:

ERROR: toomanyrequests: Too Many Requests.

or:

You have reached your pull rate limit. You may increase the limit by authenticating and upgrading: https://www.docker.com/increase-rate-limits.

A solution could be to create an account to benefit from twice the limit (200 pulls/6 hours instead of 100), or even pay for a subscription to increase the limit to 5,000 pulls per day. This solution could work and would require configuring the clients (on developer computer, CI, Kubernetes, …) to authenticate, but it may just postpone the moment when you reach the limit.


If you have an enterprise docker registry (Harbor, Nexus, …), a better solution would be to use it as a proxy cache. Indeed, by using a proxy cache, instead of having every client (every developer, CI runner, Kubernetes node, …) pulling each one the same docker image from the Docker Hub registry, the proxy cache would only pull that image once from the Docker Hub registry and serve the cached image to every client, reducing in that way the number of pulls from the Docker Hub registry.


In our case, we wanted to use Harbor for the proxy cache as it was already used as our company docker registry. Sadly, documentation and most tutorials on the internet will only explain how to configure the proxy cache in Harbor but never how to configure the docker client to make it work with the proxy cache. On the other hand, docker documentation only explains how to configure a mirror but doesn’t explain how to make it work with a proxy cache.

While we could think that having both server and client documentations would be enough, we’ll see that there’s a piece missing.

The problem

As mentioned, creating a proxy cache in Harbor is quite easy by following the documentation:

  • create a registry endpoint targeting https://hub.docker.com
  • create a new project
    • in our case, we called the project “hub
    • check the “Proxy Cache” checkbox
    • select the previously created registry endpoint

With that configuration, the expected URLs to pull images are of the form (note the “hub” part in the URL):

  • https://harborUrl/hub/library/nginx for nginx image
  • https://harborUrl/hub/bitnami/postgresql for bitnami/postgresql image

Let’s now configure our docker client by following the corresponding documentation to use that URL as mirror. Configure /etc/docker/daemon.json with:

{
    "registry-mirrors": ["https://harborUrl/hub"]
}

When restarting docker daemon (systemctl restart docker), it fails with the following errors in the logs:

failed to start daemon: invalid mirror: path, query, or fragment at end of the URI "https://harborUrl/hub"

Indeed, the docker client does not allow a context as part of the mirror: it’s only expecting a host. That means we have to configure https://harborUrl instead of https://harborUrl/hub. With that configuration:

  • pulling nginx image would result on pulling from https://harborUrl/library/nginx
  • pulling bitnami/postgresql image would result on pulling from https://harborUrl/bitnami/postgresql
    As we can see, neither matches the expected URLs for the Harbor “hub” project. Also, pulling a “library” image and a namespaced one would result in pulling from 2 different Harbor projects. So it is not possible to configure docker to use a unique Harbor proxy cache project as mirror without having to add an additional piece in the puzzle.


Note that, contrary to docker, podman (a daemonless container engine alternative to docker) has a prefix/location feature that can make it work with that Harbor configuration. The following registries.conf configuration:

[[registry]]
prefix="docker.io"
location="harborUrl/hub"

would make the command “podman pull docker.io/nginx” pull the image from the “hub” Harbor proxy.


Also note that Nexus repository manager is aware of this particularity as it allows to map a docker repository at the root (without a context path) by using a different port than the one used for all the other Nexus repositories. That means, it can map https://nexusURL/hub to https://nexusURL:8443/ which would be a valid configuration for docker.


As Harbor doesn’t have that feature, we decided to implement a proxy with some rewriting between the docker client and Harbor. Contrary to the Nexus solution, we decided to use another domain dedicated to the Docker Hub proxy instead of using a different port, but that’s just a matter of choice or preference.

Solution for harbor

Suppose our Harbor is available on https://harborUrl/ and that we want to expose the proxy cache (the “hub” project in Harbor) on https://dockerhubProxyCache/.


We’ll first define what are the expected requests/responses between the docker daemon and the registry. Then, we’ll see what are the required Nginx configurations in order to comply with those expectations.
 

Expected requests/responses

When asked to pull the “php” image, the docker daemon is expected to make the following requests and receive the following responses (thanks to that article for the details). In our example, we use the dockerhubProxyCache domain to have an idea of the expected responses our proxy should send:

1. GET on https://dockerhubProxyCache/v2/

  • the registry should reply with 401 code
  • the response header Www-Authenticate should be set to: Bearer realm="https://dockerhubProxyCache/service/token",service="harbor-registry" to give the docker daemon information on where to ask for the JWT (see next step)

2. GET on https://dockerhubProxyCache/service/token?scope=repository%3Alibrary%2Fphp%3Apull&service=harbor-registry

  • the registry should reply with 200 code
  • the response body should contain a JWT token giving pull access on the library/php image

3. GET on https://dockerhubProxyCache/v2/library/php/manifests/latest

  • the request should have an Authorization header with the previously received JWT as Bearer
  • the response should have a body with a JSON describing the image layers with their corresponding sha256 digest (or it may have a body with a “fat manifest” which would trigger another manifest request to get the one corresponding to the platform architecture/os)

4. GET on https://dockerhubProxyCache/v2/library/php/blobs/sha256:...

  • the request should have an Authorization header with the previously received JWT as Bearer
  • one request for every layer
  • the response contains the corresponding layer binary data

Nginx step by step configuration

Basic configuration

First, we need to make sure that the server_name is defined to “harborUrl” in the global configuration (nginx/nginx.conf) otherwise the custom configuration we’re going to add would break it.

...
  server {
    listen 8443 ssl;
    server_name harborUrl;
    ...
  }
...

Then we’re going to add our custom configuration. Create a nginx/conf.d/proxy.server.conf file with the following content:

server {
    server_name dockerhubProxyCache; # "dockerhubProxyCache" is the domain dedicated to the Docker Hub proxy 
    listen 8443 ssl;
    ssl_* ...; # Some SSL configuration (certificate, protocols, ciphers, ...), not provided here

    location / {
        proxy_set_header Host harborUrl; # "harborUrl" is the main domain of Harbor
        proxy_pass https://localhost:8443;
    }
}

Rewriting the ww-authenticate response header

Provided that dockerhubProxyCache domain is configured to target our Harbor Nginx, this simple configuration just proxies requests on dockerhubProxyCache to the default Harbor URL (harborUrl).
If we tried to pull an image with that configuration, the first request (GET on https://dockerhubProxyCache/v2/) would result with a 401, but with an incorrect value in the Www-Authenticate header. The value would be:
Bearer realm="https://harborUrl/service/token",service="harbor-registry"

while we expect:
Bearer realm="https://dockerhubProxyCache/service/token",service="harbor-registry"

The next requests issued by the docker daemon (to get a JWT token) would then be on the incorrect domain as the daemon uses the returned realm value. To solve that issue, we have to rewrite the response header by replacing the domain:

map $upstream_http_www_authenticate $rewritten_www_authenticate_header {
    ~^(?<prefix1>.*https://).*(?<suffix1>/service/token.*)$     $prefix1$host$suffix1;
}

server {
    server_name dockerhubProxyCache; 
    listen 8443 ssl;
    ssl_* ...;

    location ~ /v2/ {
        proxy_set_header Host harborUrl; # "harborUrl" is the main domain of Harbor
        proxy_pass https://localhost:8443;
        proxy_hide_header Www-Authenticate;
        add_header Www-Authenticate $rewritten_www_authenticate_header always;
    }

    location / {
        proxy_set_header Host harborUrl;
        proxy_pass https://localhost:8443;
    }
}

The “map” block sets the  $rewritten_www_authenticate_header variable from the Www-authenticate response header by replacing the host name (harborUrl) by the dedicated host name for the proxy cache (dockerhubProxyCache).
The new “location” block matches requests starting by /v2/. It’ll proxy the requests to Harbor and replace (hide + add) the Www-Authenticate response header by the value of the $rewritten_www_authenticate_header variable.
With that configuration, the first request (GET on https://dockerhubProxyCache/v2/) will receive the expected response.

 

Rewriting the service token scope

After this first request, the docker daemon then makes an authentication request (GET on https://dockerhubProxyCache/service/token?scope=repository%3Alibrary%2Fphp%3Apull&service=harbor-registry). Proxying that request as-is wouldn’t target the Harbor “hub” project. So, we need to rewrite the request arguments and introduce the “hub” project in the scope repository path to transform  scope=repository%3Alibrary%2Fphp in scope=repository%3Ahub%2Flibrary%2Fphp before proxying it:

map $upstream_http_www_authenticate $rewritten_www_authenticate_header {
    ...
}
map $args $rewritten_scope_args {
    ~^(?<prefix2>.*scope=repository%3A)(?<suffix2>.*)$     ${prefix2}hub%2F${suffix2}; # "hub" is the Harbor project name used for proxy cache
}

server {
    ...
    location / {
        proxy_set_header Host harborUrl;
        proxy_pass https://localhost:8443;
        if ($args ~* ^scope=repository%3A) {
            set $args $rewritten_scope_args;
        }
    }
}

The “map” block sets the  $rewritten_scope_args variable from the $args variable (corresponding to the request arguments) by inserting hub%2F (encoded version of “hub/”) after scope=repository%3A (encoded version of “scope=repository:”).
The additional configuration in the “location /” block applies that transformation on requests arguments for requests having arguments starting with scope=repository%3A.
With that configuration, the response should be a JWT giving pull access on the hub/library/php image.
Following requests on manifests and blobs endpoints will be authenticated with that JWT.

 

Rewriting the manifests and blobs paths

After that authentication, the docker daemon will make requests for manifests and blobs with GET requests of the form: https://dockerhubProxyCache/v2/library/php/…. We’ll have to rewrite those requests by adding “hub” in the path (after “v2”) before proxying it in order to target the Harbor hub project:

map $upstream_http_www_authenticate $rewritten_www_authenticate_header {
    ...
}
map $args $rewritten_scope_args {
    ...
}
map $uri $rewritten_v2_uri {
    ~^/v2/(.+)$ /v2/hub/$1; # "hub" is the Harbor project name used for proxy cache
}
server {
    ...
    location ~ /v2/ {
        proxy_set_header Host harborUrl;        proxy_pass https://localhost:8443;
        proxy_hide_header Www-Authenticate;
        add_header Www-Authenticate $rewritten_www_authenticate_header always;
        if ($request_uri ~* "^/v2/(.+)$") {
            rewrite ^/v2/(.+)$ $rewritten_v2_uri break;
        }
    }
    location / {
        ...
    }
}

The  $rewritten_v2_uri variable is set via the “map” block that transforms the request URI by adding “hub” between /v2/ and the rest of the path.
The additional configuration in the “location ~ /v2/” block applies that rewrite on requests with a path starting with “/v2/” and followed by a non-empty path. It won’t apply on the first “/v2/” request done by the docker daemon as it has an empty path after “/v2”. But it will apply on all manifests and blobs requests.

 

Complete Nginx configuration

After this step by step configuration, let’s regroup everything in a complete Nginx configuration:

map $args $rewritten_scope_args {
    ~^(?<prefix2>.*scope=repository%3A)(?<suffix2>.*)$     ${prefix2}hub%2F${suffix2}; # "hub" is the Harbor project name used for proxy cache
}
map $upstream_http_www_authenticate $rewritten_www_authenticate_header {
    ~^(?<prefix1>.*https://).*(?<suffix1>/service/token.*)$     $prefix1$host$suffix1;
}
map $uri $rewritten_v2_uri {
    ~^/v2/(.+)$ /v2/hub/$1; # "hub" is the Harbor project name used for proxy cache
}

server {
    server_name dockerhubProxyCache; # "dockerhubProxyCache" is the domain dedicated to the Docker Hub proxy 
    listen 8443 ssl;
    ssl_* ...; # Some SSL configuration (certificate, protocols, ciphers, ...), not provided here

    location ~ /v2/ {
        proxy_set_header Host harborUrl; # "harborUrl" is the main domain of Harbor
        proxy_pass https://localhost:8443;
        proxy_hide_header Www-Authenticate;
        add_header Www-Authenticate $rewritten_www_authenticate_header always;
        if ($request_uri ~* "^/v2/(.+)$") {
            rewrite ^/v2/(.+)$ $rewritten_v2_uri break;
        }
    }

    location / {
        proxy_set_header Host harborUrl; # "harborUrl" is the main domain of Harbor
        proxy_pass https://localhost:8443;
        if ($args ~* ^scope=repository%3A) {
            set $args $rewritten_scope_args;
        }
    }
}

Don’t forget to restart Nginx and configure your DNS to make “dockerhubProxyCache” target your Harbor instance.
Then, make sure your docker daemon is correctly configured to use that new proxy by configuring /etc/docker/daemon.json with:

{
    "registry-mirrors": ["https://dockerhubProxyCache"]
}

then restart the Docker daemon to apply the changes.

Conclusion

In this article, we started by presenting an usual problem (reaching Docker Hub pull limit) with what seemed to be a simple solution (using Harbor proxy cache). We then discovered that implementing that solution was not as straightforward as it seemed, mainly because of docker daemon configuration limitations (e.g.: podman has a simple working configuration).
That led us to a deep analysis of requests/responses between the docker daemon and a docker registry to find what alterations on network exchanges were needed in order to make things work.
We finally implemented those alterations through a middleware by reusing and configuring the Nginx in Harbor architecture. With that configuration, our Harbor can be used as a proxy cache for Docker Hub and solve the pull limit problem company-wide.

Maxime Robert

Maxime Robert

Expert technique