Clustering build logs to analyze common build issues

When your company attempts to build Lossless Semantic Trees (LSTs) for all of your repositories, you may find that some of them do not build successfully. While you could go through each of those by hand and attempt to figure out common patterns, there is a better way: cluster analysis.

You can think of cluster analysis as a way of grouping data into easily identifiable chunks. In other words, it can take in all of your build failures and then find what issues are the most common - so you can prioritize what to fix first.

This repository will walk you through everything you need to do to perform a cluster analysis on your build failures. By the end, you will have produced two HTML files:

one that visually displays the clusters
one that contains samples for each cluster.

Note

Clustering is currently limited to Maven, Gradle, .Net, and Bazel builds because our heuristic-based extraction of build errors is specific to these build types. Although build failures for other types won't cause error when clustering, the heuristic extraction may overlook valuable parts of the stack trace.

Setup

Before you begin, you will need to complete one of the setup methods in LOCAL_INSTALL.md. This will ensure that you have all the necessary dependencies installed.

Using System Python with venv
Using uv (Fast Python Package Installer)
Using DevContainer
Using Docker

Instructions

After set-up / installation, you can run the analysis script in one of two ways:

Analyze build logs directly
Download build logs from an Artifactory repository (optional)

1. Analyze build logs directly

If you already have the build log files locally on your machine, you can analyze them in-place using the analyze subcommand. Here's how to run it:

python scripts/analyze_logs.py analyze <output_dir>

Using uv

uv run scripts/analyze_logs.py analyze <output_dir>

Using Docker

docker run --rm -it \
  -v <path_to_output_dir>:/app/output \
  moderne-cluster-build-logs:latest \
  python analyze_logs.py analyze /app/output

Analysis with `--from` option

If your logs are located in a different directory, use the --from option to specify the path to your local log directory.

python scripts/analyze_logs.py analyze <output_dir> --from <path_to_build_logs>

Using uv

uv run scripts/analyze_logs.py analyze <output_dir> --from <path_to_build_logs>

Using Docker

docker run --rm -it \
  -v <path_to_build_logs>:/app/logs \
  -v <path_to_output_dir>:/app/output \
  moderne-cluster-build-logs:latest \
  python analyze_logs.py analyze /app/output --from /app/logs

2. Download build logs from an Artifactory repository

python scripts/analyze_logs.py download \
  --url <artifactory_url> \
  --repository-path <artifactory_repository_path_to_logs> \
  --username <artifactory_username> \
  --password <artifactory_passwd> \
  <path_to_output_dir>

Using uv

uv run scripts/analyze_logs.py download \
  --url <artifactory_url> \
  --repository-path <artifactory_repository_path_to_logs> \
  --username <artifactory_username> \
  --password <artifactory_passwd> \
  <path_to_output_dir>

Using Docker

docker run -rm -it \
  -v <path_to_output_directory>:/app/output \
  moderne-cluster-build-logs:latest \
  python analyze_logs.py download \
  --url <artifactory_url> \
  --repository-path <artifactory_repository_path_to_logs> \
  --username <artifactory_username> \
  --password <artifactory_passwd> \
  <path_to_output_dir>

Example results

Below you can see some examples of the HTML files produced by following the above steps.

`clusters_scatter.html`

This file is a visual representation of the build failure clusters. Clusters that contain the most number of dots should generally be prioritized over ones that contain fewer dots. You can hover over the dots to see part of the build logs.

`cluster_logs.html`

To see the full extracted logs, you may use this file. This file shows all the logs that belong to a cluster.

Name		Name	Last commit message	Last commit date
Latest commit History 72 Commits
.devcontainer		.devcontainer
images		images
scripts		scripts
templates		templates
.gitignore		.gitignore
Dockerfile		Dockerfile
LOCAL_INSTALL.md		LOCAL_INSTALL.md
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Clustering build logs to analyze common build issues

Setup

Instructions

1. Analyze build logs directly

Analysis with `--from` option

2. Download build logs from an Artifactory repository

Example results

`clusters_scatter.html`

`cluster_logs.html`

About

Releases

Packages

Contributors 6

Languages

moderneinc/moderne-cluster-build-logs

Folders and files

Latest commit

History

Repository files navigation

Clustering build logs to analyze common build issues

Setup

Instructions

1. Analyze build logs directly

Analysis with --from option

2. Download build logs from an Artifactory repository

Example results

clusters_scatter.html

cluster_logs.html

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 6

Languages

Analysis with `--from` option

`clusters_scatter.html`

`cluster_logs.html`

Packages