Metal and CoreML Backends #865

ChinChangYang · 2023-12-16T15:44:06Z

Summary:

KataGo harnesses the advanced capabilities of Apple Silicon through the integration of the Metal Performance Shaders Graph and CoreML. This integration empowers KataGo with GPU acceleration and compatibility with the Neural Engine, ensuring exceptional performance levels.

Documentation for Metal and CoreML Backends in KataGo:

https://github.com/ChinChangYang/KataGo/blob/metal-coreml-stable/docs/CoreML_Backend.md

Release:

https://github.com/ChinChangYang/KataGo/releases

Resolve:

Scroll to the last message when the "messages" array changes, by using the ID of the last message. Also, create a message task on the initial view appearance to fetch messages from KataGo and continuously append them to the list of messages. Created a infinite while loop in the "createMessageTask" function to continuously fetch and append new messages from KataGo.

- The message ID generation is fixed to use UUID instead of a custom implementation. - Unnecessary code related to managing message IDs is removed. The previous implementation used a custom MessageId actor to generate and manage message IDs. This commit replaces that with the use of UUID to generate unique IDs for each message. The unnecessary code related to managing message IDs, including the MessageId actor and its methods, are removed.

- Added a `@State` property `command` to track the user's input. - Created a `TextField` for the user to enter their message. - Added an `onSubmit` action to send the entered command to KataGoHelper and clear the input. - Added a `Button` to send the command to KataGoHelper and clear the input when pressed. This change enhances the user interface by allowing them to send commands to KataGo GTP from the app.

The nullability annotation for the `sendCommand` method was fixed, ensuring that a non-null `command` parameter is expected. This change ensures better code clarity and helps prevent potential runtime issues.

- The `getOutputWithBinInputs` method's output variable names have been updated to improve readability and consistency. This commit changes `policyOutput` to `policyOutputs`, `valueOutput` to `valueOutputs`, `ownershipOutput` to `ownershipOutputs`, `miscValuesOutput` to `miscValueOutputs`, and `moreMiscValuesOutput` to `moreMiscValueOutputs`.

The `init(text: String) async` method has been changed to `init(text: String)` in order to remove the `async` attribute. Now, when entering a GTP command in the TextField, it will disable autocorrection and autocapitalization. The `onSubmit` action has been updated to append a new Message to the list of messages before sending the command. Additionally, the `await` operator has been removed from the creation of a new Message object.

- Added `CommandButton` struct to display command buttons with specific titles and actions. - Included buttons for `genmove b`, `genmove w`, `showboard`, and `clear_board`. - Initialized message task by adding `Initializing...` message and sending `showboard` command.

This commit adds the GobanView.swift file to KataGo iOS, which includes functions for rendering a Go board. The file defines a SwiftUI view called GobanView, which is responsible for drawing the background, lines, and star points of the board. It also calculates the dimensions of the board based on the available geometry. The GobanView struct is previewed in the GobanView_Previews struct.

This commit adds the CommandView.swift file, which contains the implementation of a view for handling commands and displaying messages. The CommandView struct includes properties and functionality for managing a list of messages, handling GTP commands, and displaying the messages in a scrollable view.

- Added a new CommandView tab for entering GTP commands and displaying messages. - Added a new GobanView tab for displaying the Goban interface. - Updated ContentView to use TabView to switch between tabs.

- Change maxTime value from 10 to 1 second for capping search time.

This commit adds the ability to draw black and white stones on the GobanView. The `drawBlackStone` and `drawWhiteStone` functions are implemented to draw the stones at specific coordinates. The `drawStones` function is added to the `GobanView` and calls the stone-drawing functions to draw several stones on the board.

1. Rename `StarPoint` struct to `BoardPoint` for clearer semantics. 2. Modify `drawStones` method to use ForEach for better maintainability. 3. Revise stone rendering with gradient and shadow optimizations.

- `CommandView` now uses a `messagesObject` environment object instead of a local state variable for managing messages. - The `CommandView` no longer starts a thread in the `init()` method. - The `CommandView` now retrieves messages from `messagesObject` and appends new messages to it. - The `createMessageTask()` method has been moved to the `ContentView` and is now responsible for appending new messages to `messagesObject`. - The `ContentView` now initializes and uses `stones` and `messagesObject` as environment objects. - The `createMessageTask()` method in `ContentView` now retrieves messages from KataGo and appends them to `messagesObject`. This commit introduces changes to improve the message management in the CommandView and ContentView structures.

…nd GobanView The commit adds the stones and board objects as environment objects for the CommandView and GobanView structs in ContentView.swift. The stones object is added to the environment for CommandView, and the stones and board objects are added to the environment for GobanView. These environment objects allow these structs to access and update the state of the stones and board objects.

…banView to allow tapping on the board to make a move.

- Adjust the calculation of squareWidth and squareHeight to include an additional space for the board width and height respectively. - Update the frame width and height of the Image in the GobanView.

Extract StoneView.swift from GobanView.swift to improve readability and maintainability.

The light effect in the StoneView component has been updated to include an additional color stop to create a more prominent effect. The start and end radii of the RadialGradient have also been adjusted for better visual appearance. Also, the radius of the blur applied to the stone color circle has been reduced to improve the overall appearance of the StoneView component. Additionally, the dimensions object has been assigned to a separate variable for better readability and code organization.

This commit adds the AnalysisView.swift file, which contains the code for visualizing the analysis data. The AnalysisView struct displays circles on the screen based on the analysis data. The size, position, color, and visibility of each circle are determined by the data. The AnalysisView_Previews struct is also defined to provide a preview of the view.

- The command view in this commit has been refactored to add a new state property called `isHidden`, which determines whether to hide the view or not. - With the toggling functionality implemented, the code now checks the value of `isHidden` to determine whether to show the scroll view and the text field. Note: The isHidden property is set to false on appear and true on disappear.

- Change maxTime value from 1 to 0.1 in default_gtp.cfg

This commit updates the CoreML model references in the GitHub Actions workflow and the setup script to the latest versions (v1.15.1) from the KataGo GitHub repository. **Changes include:** 1. **GitHub Actions Workflow Updates:** - Replaced the model URLs for FP16 and FP32 models in multiple steps to use the new version `v1.15.1-coreml2`: - **FP16 Model**: Updated from `KataGoModel19x19fp16v14s7709731328.mlpackage.zip` to `KataGoModel19x19fp16v14s9996604416.mlpackage.zip`. - **FP32 Model**: Updated from `KataGoModel19x19fp32v14s7709731328.mlpackage.zip` to `KataGoModel19x19fp32v14s9996604416.mlpackage.zip`. - **FP32 Meta Model**: Updated from `KataGoModel19x19fp32meta1.mlpackage.zip` to `KataGoModel19x19fp32v15m1humanv0.mlpackage.zip`. - Ensured symbolic links point to the updated model names. 2. **Setup Script Updates:** - Updated the model download command for FP16 in the setup script to reflect the new version `KataGoModel19x19fp16v14s9996604416.mlpackage.zip`. - Added commands to download and setup the new FP32 model version `KataGoModel19x19fp32v15m1humanv0.mlpackage.zip`. - Adjusted the unzip command and file renaming for consistency with new model names. **Impact:** These changes ensure that the workflow and setup scripts use the latest models, which may include performance improvements and updates. This is crucial for maintaining compatibility and leveraging the latest features provided by the KataGo models. **Note:** The old model versions have been phased out from the scripts, and the new versions maintain the existing symbolic link structure for seamless integration in the build process.

This commit updates the documentation in the `CoreML_Backend.md` file to reflect the changes in the KataGo model versions and includes necessary adjustments for downloading and linking models. Key changes include: - Updated the download links for the binary models to the latest version `v1.15.1-coreml2`, replacing the previous version `v1.13.2-coreml2`. - Updated the symbolic links to reflect the new model filenames corresponding to the latest releases. - Adjusted benchmark, GTP, and analysis command examples to use the new binary model filenames. - Replaced the outdated human-trained CoreML model download link with the updated model from `v1.15.1-coreml2`. - Enhanced clarity on linking the human-trained CoreML model in the run directory. - Reintroduced the section for updating the human-trained CoreML model, including instructions for downloading the checkpoint and converting it to a CoreML model. These changes ensure that the documentation provides accurate and up-to-date instructions for utilizing the CoreML backend with the latest models available.

This commit enhances the `createComputeHandle` function within the `NeuralNet` class to ensure that the instantiation of the `ComputeHandle` object is thread-safe. The modification employs a mutex to prevent simultaneous access to the critical section of code responsible for creating the `ComputeHandle` instance. **Changes Made:** - Introduced a static mutex variable `computeHandleMutex` to synchronize access to the `ComputeHandle` creation logic. - Encapsulated the instantiation of `ComputeHandle` within a lock guard (`std::lock_guard`) to lock the mutex and ensure that only one thread can execute the instantiation at any given time. - Ensured that the lock is held only during the critical section where the `ComputeHandle` instance is created, thereby minimizing contention and maximizing efficiency for other threads that might be attempting to use the `createComputeHandle` method concurrently. **Rationale:** The previous implementation of `createComputeHandle` allowed concurrent invocations that could lead to race conditions during the creation of `ComputeHandle`, especially since this operation involves writing data to the file system. By enforcing thread safety, we minimize the risk of corruption and enhance the robustness of the neural network's backend processing capabilities. **Related Issues:** - This commit addresses potential threading issues outlined in previous test processes of GitHub Actions.

Updated the model download links in the build workflow and setup script from version v1.13.2-coreml1 to v1.15.1-coreml2 to ensure compatibility and resolve issues related to the GPU error test.

This commit updates the version number in the source code to reflect the new coreml3 version. Both the getKataGoVersion and getKataGoVersionForHelp methods have been modified to return the updated version string.

- Renamed the meta encoder version prefix from "meta" to "m" in convert_coreml_pytorch.py for enhanced consistency. - Updated CoreML_Backend.md to format the model directory name as code, improving clarity.

**Description:** This commit introduces a new feature to compress the CoreML model after conversion from PyTorch. The following changes were made: - Imported `coremltools.optimize` to leverage optimization functionalities for model compression. - Moved the definition of the model file name to a new location for better readability. - Added a model compression process: - Configured the palettization with a bit depth of 8 bits. - Created an optimization configuration using the defined configuring options. - Implemented the palettization of the model weights, resulting in a compressed model. - Defined a new file naming convention for the compressed model that indicates the bit configuration. - Implemented saving for the compressed model, followed by logging the location of the saved file. **Impact:** This enhancement aims to reduce the size of the finalized CoreML model, improving storage efficiency and potentially speeding up the inference process when deployed on resource-constrained environments.

…ility This commit introduces a new method, `safelyPredict`, in the `CoreMLBackend` class to improve the robustness of the model's prediction capabilities. The following changes have been made: 1. **Retry Logic for Predictions:** - The `safelyPredict` function attempts to execute a prediction using the CoreML model up to two times. This is to catch transient errors that may arise during the prediction process. - If both attempts fail, the function falls back to a third attempt using a model compiled for CPU execution. 2. **Model Compilation Improvement:** - The model is now compiled with flexible compute units, allowing for better resource management based on the device's capabilities. The transition from using a boolean `useCpuAndNeuralEngine` flag to `MLComputeUnits` increases clarity and future-proofs the method by accommodating additional compute configurations. 3. **Code Refactoring:** - Updated the `init` method of `CoreMLBackend` and several references to the `compileBundleMLModel` method to align with the new parameters. - Adjusted corresponding unit tests in `CoreMLModelTest` to align with the new parameters. 4. **Error Handling:** - Introduced enhanced error handling within the `safelyPredict` method, ensuring that any issues during the prediction process are properly managed and do not crash the application.

Changed the `model` property in `CoreMLBackend` from a constant to a variable to allow reassignment when recompiling the model. - Updated the `safelyPredict` function to handle prediction failures more gracefully: - Reorganized the logic to include a loop that attempts compilation and prediction with both cached and recompilation strategies. - Introduced a new private method `compileAndPredict` to encapsulate the model compilation and prediction logic, improving code readability and maintainability. - Enhanced the `KataGoModel` class by modifying the `compileBundleMLModel` and `compileMLModel` methods to accept a `mustCompile` parameter, allowing conditional recompilation of the model based on input flags. - This change addresses issues where the model fails to produce valid predictions by ensuring a fresh compilation under specific circumstances, improving overall reliability in predicting with CoreML models.

This update introduces a new optional argument, `-nbits`, that allows users to specify the number of bits to use when palettizing model weights. The weights are palettized during conversion, improving flexibility and enabling different quantization levels based on user preference. The code also handles cases where no palettization is applied.

- Introduced a new command-line argument `-sparsity` to specify the target sparsity level for pruning weights during model conversion. - Updated the CoreML model conversion process to include a sparsity configuration that prunes weights according to the specified target. - Adjustments made to ensure that models can be converted with both weight pruning and quantization.

- Introduced OpLinearQuantizerConfig and linear_quantize_weights functions. - Added support for 8-bit weight quantization based on a predefined weight threshold. - Enhanced the existing weight pruning process to include joint compression options. - Updated argument handling for sparsity, ensuring default values are set correctly.

Updated `convert_coreml_pytorch.py` to add a sparsity description for pruned models and modified the compression description for better clarity. Now includes default empty sparsity description when no pruning is applied.

- Introduced a new argument '-prune-to-zero' to allow users to prune all weights to zero, creating a null model during export. - Updated the `write_weights` function to handle the new pruning logic, ensuring models can be exported as zero-weight models if desired.

…lity - Added detailed docstrings to functions for better documentation. - Separated version printing into a dedicated function. - Consolidated argument parsing into a single function for clarity. - Modularized model tracing and conversion logic for better separation of concerns. - Improved handling of optional parameters with defaults. - Enhanced error handling with try-except block in the main execution flow. - Cleaned up variable names and function calls for readability. This refactoring aims to improve maintainability and enhance the clarity of the code structure while preserving existing functionality.

- Updated nbits choices to include 6, 3, and additional granularity options. - Changed the quantization mode to "linear" for improved accuracy. - Enhanced the palettization configuration with 'kmeans' mode and per-grouped channel granularity for better performance. - Removed unnecessary weight threshold parameter in quantization for cleaner code. These changes optimize the quantization process, improving both accuracy and latency.

Updated the logic for determining the meta encoder version to handle cases where the metadata encoder is not present or the version is missing from the configuration. This ensures the correct version is set and prevents errors during conversion.

Enhanced the logic for determining the minimum deployment target based on model sparsity and the number of bits specified. The updated conditions provide clearer handling for different scenarios, ensuring compatibility with iOS16 for 8-bit models while maintaining support for iOS18 for others.

- Updated script calls in export_model_for_selfplay.sh, shuffle.sh, shuffle_loop.sh, and train.sh to use `python` instead of `python3` for better compatibility with Miniconda environment. - Enhanced GPU handling in train.py to correctly utilize MPS (Metal Performance Shaders) for devices on MacOS.

Enhanced the `convert_coreml_pytorch.py` script by introducing an optional `-output` argument. This allows users to specify a custom path for the converted Core ML package, improving flexibility in model saving. Updated the `save_coreml_model` function to handle the new output path.

This update modifies the configuration files `gatekeeper1_maxsize9.cfg` and `selfplay1_maxsize9.cfg` to enhance performance when using the Metal backend in KataGo. The number of game threads has been reduced from 128 to 16 to optimize resource allocation for the Metal architecture. Additionally, the neural network maximum batch size has been decreased from 128 to 8. The number of neural network server threads per model has been increased from 1 to 2 to improve parallel execution. These adjustments aim to enhance training efficiency on Metal backend.

- Updated self-play, allowing specification of the Core ML model directory for loading. - Enhanced Core ML backend to accept and utilize a model directory, ensuring more flexible model management. - Modified various neural network backends to compile with the specified directory path.

- Added new arguments for Core ML model files in KataGoCommandLine: - coreMLModelFileArg for the core ML model file. - humanCoreMLModelFileArg for the human core ML model file.

- Refactored gatekeeper to initialize neural network evaluators with Core ML model paths provided by the user. - Changed references from NNEvaluator to initializeCoreMLEvaluator for both test and accepted models.

…permanent URL This change ensures that each CoreML model instance compiles to its own unique URL. Instead of checking for existing model digests to decide compilation, the model is always compiled and saved to a new URL. This resolves potential conflicts when multiple instances attempt to load from the same permanent URL, ensuring accurate predictions for each model instance. Updated the compileMLModel method accordingly.

Updated the gatekeeper1.cfg, gatekeeper1_maxsize9.cfg, selfplay1.cfg, and selfplay1_maxsize9.cfg configuration files to utilize the Neural Engine (NPU) instead of the GPU. Key changes include: - Reduced the number of game threads from 128 to 16 for better performance. - Decreased the neural network maximum batch size from 128 to 8. - Increased the number of neural network server threads per model from 1 to 2 for improved parallel processing. These modifications aim to switch to neural engine during training and self-play processes.

ChinChangYang added 30 commits July 4, 2023 23:40

Fix nullability annotation for sendCommand method

dd6423a

The nullability annotation for the `sendCommand` method was fixed, ensuring that a non-null `command` parameter is expected. This change ensures better code clarity and helps prevent potential runtime issues.

Ignore printHelp and handleSubcommand functions for iOS

f87232c

Change font of message text to monospaced in ContentView.swift

fe3bec7

Add proper indentation for MetalProcess functions

8f337dd

Create WoodView.swift

f11e7e3

Add CommandView and GobanView tabs

2dd711a

- Added a new CommandView tab for entering GTP commands and displaying messages. - Added a new GobanView tab for displaying the Goban interface. - Updated ContentView to use TabView to switch between tabs.

Update maxTime value in default_gtp.cfg to improve search speed.

7cdf10f

- Change maxTime value from 10 to 1 second for capping search time.

Change build configuration from Debug to Release in katago.xcscheme.

faf2961

Fix star point size in GobanView.swift

f036e06

Refactor GobanView

7a98604

1. Rename `StarPoint` struct to `BoardPoint` for clearer semantics. 2. Modify `drawStones` method to use ForEach for better maintainability. 3. Revise stone rendering with gradient and shadow optimizations.

Change allowResignation to false in default_gtp.cfg

f8a8982

Add ButtonView and PlayerObject to handle player turns, and update Go…

8b5c331

…banView to allow tapping on the board to make a move.

Enlarge goban with an additional space

e8e9296

- Adjust the calculation of squareWidth and squareHeight to include an additional space for the board width and height respectively. - Update the frame width and height of the Image in the GobanView.

Add StoneView.swift to the project

0863517

Extract StoneView.swift from GobanView.swift to improve readability and maintainability.

Add AnalysisView.swift and implement analysis feature.

aab5bf9

Update maxTime value in default_gtp.cfg to 0.1

67f08d2

- Change maxTime value from 1 to 0.1 in default_gtp.cfg

ChinChangYang added 6 commits July 27, 2024 22:56

Update model version to resolve GPU error test failure

0945e13

Updated the model download links in the build workflow and setup script from version v1.13.2-coreml1 to v1.15.1-coreml2 to ensure compatibility and resolve issues related to the GPU error test.

Update KataGo version from 1.15.1-coreml2 to 1.15.1-coreml3

825305e

This commit updates the version number in the source code to reflect the new coreml3 version. Both the getKataGoVersion and getKataGoVersionForHelp methods have been modified to return the updated version string.

Improve consistency and documentation

322ee23

- Renamed the meta encoder version prefix from "meta" to "m" in convert_coreml_pytorch.py for enhanced consistency. - Updated CoreML_Backend.md to format the model directory name as code, improving clarity.

ChinChangYang mentioned this pull request Aug 4, 2024

Improve KataGo performance on M2 Mac #857

Open

ChinChangYang added 23 commits August 5, 2024 09:29

Merge remote-tracking branch 'origin/master' into metal-coreml

c229f25

Merge remote-tracking branch 'origin/master' into metal-coreml

7e727f0

Enhance model descriptions in convert_coreml_pytorch.py

2ac94d1

Updated `convert_coreml_pytorch.py` to add a sparsity description for pruned models and modified the compression description for better clarity. Now includes default empty sparsity description when no pruning is applied.

Merge branch 'mac-training' into metal-coreml

46b800c

Extend KataGoCommandLine for Core ML model file support

08b1f5b

- Added new arguments for Core ML model files in KataGoCommandLine: - coreMLModelFileArg for the core ML model file. - humanCoreMLModelFileArg for the human core ML model file.

Update gatekeeper to utilize Core ML model paths

ddd6198

- Refactored gatekeeper to initialize neural network evaluators with Core ML model paths provided by the user. - Changed references from NNEvaluator to initializeCoreMLEvaluator for both test and accepted models.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Metal and CoreML Backends #865

Metal and CoreML Backends #865

ChinChangYang commented Dec 16, 2023 •

edited

Loading

Metal and CoreML Backends #865

Are you sure you want to change the base?

Metal and CoreML Backends #865

Conversation

ChinChangYang commented Dec 16, 2023 • edited Loading

ChinChangYang commented Dec 16, 2023 •

edited

Loading