Recently I started playing with Metabase to create interesting visualizations and have the teams quickly check how we are working.

One of the main tools we use is GitLab, so I needed the ability to query it. To be on the safe side, I configured the backup job to create a read-only mirror of the DB, and have Metabase access it.

However, as soon as I started trying to query it, I found that the table structure used in GitLab was difficult to use for quick visualizations: while the unit our users know is their project, projects reside in a namespace, and namespaces can be nested. Also, since GitLab supports forks, you may end up with several projects that you want to aggregate stats of in order to reflect accurate numbers.

To solve this, I created a few helper queries using PostgreSQL CTEs1.

Here’s a dump of the queries I found useful, and some brief explanation of what they do.

-- flat_namespaces - for a leaf_id, obtain the full name of the namespace where it is
WITH RECURSIVE flat_namespaces AS (
    SELECT CONCAT(name) AS names, id, parent_id, id AS leaf_id
    FROM namespaces
    WHERE parent_id IS NULL
UNION ALL
    SELECT CONCAT(flat_namespaces.names, ' / ', namespaces.name) AS names, namespaces.id, namespaces.parent_id, flat_namespaces.leaf_id
    FROM namespaces
    INNER JOIN flat_namespaces ON namespaces.id = flat_namespaces.parent_id
)
-- root_projects - searches for the root project based on a candidate fork id
, root_projects(root_project_id, fork_project_id) AS (
    ( -- project with no forks
        SELECT id AS root_project_id, id AS fork_project_id
        FROM projects
        WHERE NOT EXISTS (
            SELECT 1
            FROM fork_network_members
            WHERE (forked_from_project_id = projects.id OR project_id = projects.id)
        )
    )
    UNION
    ( -- project that has forks (parent)
        SELECT DISTINCT forked_from_project_id as root_project_id, forked_from_project_id as fork_project_id
        FROM fork_network_members
    )
    UNION
    ( -- project that is a fork
        SELECT DISTINCT forked_from_project_id AS root_project_id, project_id AS fork_project_id
        FROM fork_network_members
        WHERE forked_from_project_id IS NOT NULL
    )
),
project_with_full_namespace(full_name, project_id) AS (
    SELECT CONCAT(flat_namespaces.names, ' / ', projects.name), root_projects.fork_project_id AS project_id
    FROM root_projects
    INNER JOIN projects ON projects.id = root_projects.root_project_id
    INNER JOIN flat_namespaces ON projects.namespace_id = flat_namespaces.leaf_id
)

SELECT DISTINCT *
FROM project_with_full_namespace

Let’s start from the top:

flat_namespaces will return a table containing the concatenated names, the parent namespace id, and the leaf namespace id.

So, if you have a namespace at /a/b/c, where the IDs are 1, 2, and 3 respectively, and you query it with a WHERE clause of leaf_id = 3, you’ll get a/b/c as the names value.

root_projects will return a pair of root_project_id and fork_project_id. This takes into account the three different possibilities that might exists:

  • Projects with no forks.
  • Projects with forks.
  • Forks of other projects.

Basically, with a fork_project_id you can get the corresponding root_project_id. If you couple this with flat_namespaces, for any given project ID on your GitLab installation, you can grab the root_project_id with the “full path” of namespaces as a string.

Finally, project_with_full_namespace uses the previously defined CTEs to create the “full name” of the project and all its parent namespaces, similar to what you see in GitLab’s URLs. It’s important to note that this query will ignore forks, as it uses root_projects to navigate the fork tree, but this is what we needed for our dashboards.

  1. Common Table Structure