Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add hardware guide for bioimage analysis #47

Open
wants to merge 10 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file added docs/01_introduction/SoC.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
112 changes: 112 additions & 0 deletions docs/01_introduction/hardware.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,112 @@
# Choosing the Optimal Computer

Though Python is runnable on most of modern operating systems (OS) including Windows, MacOS and Linux, it is beneficial to keep scripting under *nix environment. Here we provide a guide for beginners to choose your computing hardwares.
jackyko1991 marked this conversation as resolved.
Show resolved Hide resolved

This guide is intentionally written for programming beginners to code locally. For advance research units equip with python servers, we will cover a series of remote coding techniques to unleash more complex bioimage analysis.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This guide is intentionally written for programming beginners to code locally. For advance research units equip with python servers, we will cover a series of remote coding techniques to unleash more complex bioimage analysis.
This guide is intentionally written for programming beginners to code locally.

I'm removing the sentence about remote coding, because the mentioned series is not linked yet. We can add the reference back once these tutorials were written.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed accordingly


## General Guide

When choosing a computer for bioimage analysis, it's essential to consider hardware performance, memory size, OS, portability, and application scenarios. Here’s a summary comparing different computing modalities:

| Feature | Laptops | Desktops | Workstations | Servers |
|---------------------------|------------------------------------------|------------------------------------------|------------------------------------------|------------------------------------------|
| **Hardware Performance** | Mid-range to high-end CPUs and GPUs | High-end CPUs and GPUs | Top-tier CPUs and multiple GPUs | Multiple high-end CPUs and GPUs |
| **Memory Size** | Up to 64GB (most have 16GB-32GB) | Up to 128GB or more | 128GB to 512GB or more | Terabytes of RAM |
| **GPU** | Integrated or dedicated GPUs | High-end dedicated GPUs (e.g., NVIDIA RTX) | Professional GPUs (e.g., NVIDIA Quadro/RTX A-series) | Multiple professional GPUs (e.g., NVIDIA Tesla/Quadro) |
| **NPU** | SoC Dependent | Limited NPU support | Available in some high-end models | Available, especially in AI-optimized servers |
| **OS** | Windows, macOS, Linux | Windows, macOS, Linux | Windows, macOS, Linux | Linux, Windows Server |
| **Portability** | Highly portable | Not portable | Not portable | Not portable |
| **Application Scenarios** | Mobile work, basic to moderate tasks | Stationary use, moderate to intensive tasks | Intensive tasks, advanced analysis | Large-scale projects, remote access, collaborative research |
| **ARM vs x86** | Mostly x86 (some ARM options like Apple Silicon and Snapdragon XLite) | Mostly x86 except for Apple | Mostly x86 except for Apple | Mostly x86 (ARM servers available, e.g., AWS Graviton) |
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious: Have you tried installing Pyton on a non-Mac ARM computer, e.g. featuring the Snapdragon CPU. I'm curious how well the Python ecosystem is compatible with these machines. Also which Operating System runs on non-Mac ARM computers?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So far I have tested on several ARM platforms like Raspberry Pi and nvidia SBC. Both are running under Ubuntu.

Software slowly catching up with pre-compiled libraries and Linux side. If one using VSCode and miniforge the environment is quite mature.

miniforge has no pre-compiled version for ARM Windows. 100% not recommended.

Certain image libraries that require OpenGL, and some SoC from Broadcom (RPI) run natively OpenGLES2 that emulates OpenGL using mesa drivers from Debian. High-performance rendering like 3D data plots is very bottlenecked.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, then how likely is it that someone who aims at analysing images uses a Raspberry PI or an NVidia SBC?

| **ARM Performance** | Energy-efficient, good for battery life | Limited use, lower performance than x86, suitable for edging computing like smart microscopy | Rare, used in specific scenarios | High efficiency, used in cloud services |
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The not-so-computational reader might wonder what "edging computing" is.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed the term

| **x86 Performance** | High performance, widely supported | Higher performance, widely supported | Highest performance, widely supported | Highest performance, widely supported |
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do you think about the difference between x86 and x64 ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

x64 (in full x86-64) is the 64bit version of x86 instruction set. x86 is the broader family name.



### Key Considerations
- **OS**: Choose the OS based on software compatibility and personal preference. Windows and Linux are common across all device types, with macOS being exclusive to laptops and desktops. More concern is about OS terminal and [shell scripting](#shell-scripting)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you complete this sentence? What's your concern?

- **Processor Architecture**: ARM processors are known for energy efficiency and are increasingly used in laptops (e.g., Apple M1/M2) and servers (e.g., AWS Graviton). x86 processors dominate in performance and are widely supported across all device types, making them the standard choice for high-performance bioimage analysis tasks. For details check [here](#processor-chipset-architecture).
- **GPU**: Essential for handling complex image processing and analysis. Laptops typically have consumer-grade GPUs, while desktops and workstations offer higher-end consumer or professional-grade GPUs. For independent GPU on laptop the power consumption is very high and limited the portability and sustainable coding environment. Servers can have multiple high-end GPUs optimized for parallel processing and large-scale computations. Considering the domination of CUDA in AI domain, Nvidia is the only recommended vendor. For details of GPU applications check [here](#gpu-support).
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may make sense to mention deep-learning (DL) here. Many DL applications require a GPU.

- **NPU**: Neural Processing Units are becoming more relevant for AI and machine learning tasks. The performance across vendors are yet to be benchmarked. To understand more about NPUs read the section [here](#neural-processing-unit-npu).
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How much experience do you have with those? I think I never worked with an NPU. Is it maybe too early to mention this technology in a introduction document for beginners?



### Summary:
- **Laptops**: Best for portability and moderate analysis tasks, with some ARM options for better energy efficiency.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before we recommend ARM (it sounds a bit like) I would like to make sure that Python runs on this architecture. (See comment above)

- **Desktops**: Offer higher performance and memory capacity, suitable for stationary use with high-end GPU options.
- **Workstations**: Provide top-tier performance with advanced GPU and NPU options, ideal for demanding bioimage analysis tasks.
- **Servers**: Unmatched in performance and memory, perfect for large-scale, collaborative, and remote-access analysis tasks, with ARM options for energy-efficient cloud computing.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you tried ARM-based servers? Where are those accessible?


Choose the appropriate device based on your specific needs, considering the balance between portability, performance, and the nature of your bioimage analysis tasks.

## Shell Scripting
For historical reason the command line (CLI) batch scripting is divided into unix-like bash (MacOS, Linux) and DOS-like Powershell (Windows). For Windows users it is always recommended to run Python alongside with Git bash (https://git-scm.com/downloads) that maximally mimic the running *nix running environment.

With the seamless integration of bash terminal, remote SSH and Jupyter extension in VSCode, the experience of different operating systems does not differ very much. But under certain special occasions like 2FA security login to computing clusters, Linux or MacOS can retain a better experience with *nix specialised functions like SSH sockets for connection persistence.
Copy link
Owner

@haesleinhuepf haesleinhuepf Jul 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sentence contains multiple terms non-computational folks may not know (SSH, VSCode, 2FA, sockets)


| Operating System | Terminal Emulator | Default Shell | Additional Shells | Pros | Cons |
|------------------|------------------------------|------------------|-------------------------------------|------------------------------------------------------------------------------|--------------------------------------------------------------------|
| Linux | GNOME Terminal, Konsole | Bash | Zsh, Fish, Ksh, Tcsh, Dash | Highly customizable, vast array of tools, strong community support, open-source | Fragmentation in terminal emulators, varying default configurations|
| macOS | Terminal, iTerm2 | Zsh (since 10.15) | Bash, Fish, Ksh, Tcsh | User-friendly, well-integrated with macOS, iTerm2 offers advanced features | Terminal app is less feature-rich compared to iTerm2 |
| Windows | Command Prompt, PowerShell, Windows Terminal | PowerShell | Bash (via WSL), Git Bash, Cygwin | Powerful scripting capabilities in PowerShell, WSL brings Linux compatibility | Command Prompt is limited, PowerShell syntax can be complex |
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at this table, I find many terms that must be confusing for beginners. Also I'm wondering what a reader should conclude from the table. I see three options:

  • Write more about the context. Why should I look into the table?
  • Add links to all terms that are potentially new to readers.
  • Remove the table


## Processor Chipset Architecture
<div style="text-align: center;">
<img src="./SoC.png" alt="Placeholder Image" style="width:50%;">
<p><em>Intel Meteror Lake processor architecture. Modern day IC vendors tends to integrate various computation components on one single chipset to facilitate performance. When performing bioimage analysis we often utilise the processor's different computational units. Certain processor architectures facilitate more one specific tasks, e.g. image decode/encode tasks can take advantage of Intel Integrated Performance Primitives (IPP) library with hardware level accelerations.</em></p>
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is an IC vendor? Why is it necessary to refer to this particular architecture (Meteor Lake?)

</div>

Modern days computer CPUs are more lean to a System-on-a-Chip (SoC) that integrates all major components of a computing device including CPU, GPU, NPU and RAM. The physically compactness brings shorter communication route among each computing units, hence facilitate computing performance.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe my knowledge is not up-to-date but as far as I know the only vendor shipping SoCs to end-users is Apple at the moment. Correct?


However the SoCs may still classified by the CPUs instruction sets, mainly x86 and ARM. Python libraries natively built on one of the architecture may not be directly runnable on the other, unless with OS layer translation or code compilation from source. i.e. Legacy x86 Python libraries may not be runnable on ARM computers. For power performance reason we see chipset manufacturers are releasing new SoCs in ARM architecture, yet most of the existing bioimage analysis software are pre-compiled in x86. With the effort of Apple Rosetta 2, the issue is more relieved yet not 100% compatible. So bare in mind in choosing the adequate CPU for your analysis work.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section appears very complicated for non-computational folks. Is it necessary to explain this to people who want to buy a computer for image processing?


When necessary, consult the code developer for the support to the CPU platforms. Following is a summary for the CPUs architectures:

| Feature | Apple Silicon | Intel | AMD | Snapdragon Xlite | NVIDIA |
|----------------------------------|--------------------|---------------------|-------------------|--------------------|--------------------|
| **Architecture** | ARM-based | x86/x86-64 | x86/x86-64 | ARM-based | ARM-based (Grace CPU) |
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the ARM-based NVidia CPU something normal people can buy?

| **Notable Series** | M1, M2 | Core, Xeon | Ryzen, EPYC | Snapdragon 8cx | Grace CPU |
| **Manufacturing Process** | 5nm (TSMC) | 10nm, 7nm, 14nm | 7nm, 6nm, 5nm | 7nm (TSMC) | 5nm (TSMC) |
| **Performance Cores** | High-performance | High-performance | High-performance | High-performance | High-performance |
| **Efficiency Cores** | High-efficiency | Not typical | Not typical | High-efficiency | Not typical |
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

| **Integrated Graphics** | Yes (Apple GPU) | Yes (Intel Iris, UHD) | Yes (Radeon Graphics) | Yes (Adreno GPU) | Yes (NVIDIA GPU) |
| **Thermal Design Power (TDP)** | Low to moderate | Moderate to high | Moderate to high | Low | Moderate to high |
| **Primary Use Cases** | Laptops, Desktops | Laptops, Desktops, Servers | Laptops, Desktops, Servers | Laptops, Mobile Devices | HPC, AI, Data Centers |
| **Special Features** | Unified Memory | Hyper-Threading, vPro | Simultaneous Multithreading (SMT) | AI Engine, 5G | AI Acceleration, NVLink |
| **Compatibility** | macOS | Windows, Linux, macOS | Windows, Linux | Windows, Android | Linux |

## GPU Support
### AI Training
Though all SoC manufacturers embeds GPU in the chipset, the AI based analysis is largely relying on NVidia CUDA as the base software stack. Common neural network libraries in Python (pyTorch and Tensorflow) are the foundation stone of popular models like UNet, Cellpose and Stardist. Yet we are seeing a recent support to pyTorch AMD ROCm and Intel OneAPI AI acceleration, the community support is fairly limited when comparing to CUDA. Considering the training scalability and infrastructure support across major GPU farms/research clusters, NVidia is still the sole runner when consider new model training.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if SoC-users are the primary target audience. I'm wondering if most imaging scientists in "rich" institutes may have a workstation with an NVidia GPU. Less wealthy image analysts may do their work on cheaper laptops, maybe offering gaming GPUs.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NPU is more for inference. In my analysis experience now cell detections are 80% relies on AI based segmentation (mainly cellpose).

I don't think too many bioimage analysts will train their specific cell detection model. That's the reason why I think the NPU will play a significant role in the upcoming years and is worth more mentioning than GPU.

All new laptop CPUs are SoC from the year 2024 with most of them embedded with NPU. The only difference is if they have an independent CUDA chipset.


### AI Inference
Machine learning algorithms consists of two parts: model training and inference. The computation resources for a fixed AI model to be implemented in new data are much smaller than training from scratch. On smaller AI tasks non-CUDA chipsets bring larger options for bioimage analysis. The inference of neural network based AI can be physically accelerated with specifically designed circuits. Such designs are often referred as neural processing units (NPU). NVidia, specifically added Tensor Core in bundle with optimised packages like cuDNN and Transformer Engine, to their later GPU products. We will cover this topic on the later of the article.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, I'm not sure if specifically designed circuits and NPUs are available to common bio-image analysts.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are on all newly released laptops, ranging from Intel, Apple, AMD and Nvidia.

Apple equipped NPU after M1 and Nvidia has Tensor core after Volta (V100/GeForce RTX 20 series)

All tensor cores are enabled by default with Tensorflow (https://docs.nvidia.com/deeplearning/frameworks/tensorflow-user-guide/index.html#tf_disable_tensor_op_math).

To the concern of beginners, I am wondering if one cannot afford the high price of nvidia devices, they should still have a guide on NPU accelerated inference. For apple users they will need the forked version of tensorflow: https://developer.apple.com/metal/tensorflow-plugin/


<div style="text-align: center;">
<img src="./npu.png" alt="Placeholder Image" style="width:50%;">
<p><em>General Matrix Multiplication (GEMM) as the fundamental building block of neural network (NN) operations. The math basis of NNs and image manipulation are similar embarrassingly parallel tasks involving matrices, leading GPU widely used in many machine learning tasks.</em></p>
</div>

### GPGPU Acceleration
Apart from AI applications, bioimage analysis tasks like single plane illumination fluorescent correlation spectroscopy (SPIM-FCS) performs [pixelwise fitting of the autocorrelation function](https://github.com/bpi-oxford/Gpufit/blob/master/Gpufit/models/spim_acfN.cuh). In quantitative imaging one may be interested in photon counting or camera calibrated denoising, that largely relies on the [pixel-by-pixel gain fitting](https://github.com/jackyko1991/sCMOS-Denoise/blob/main/notebooks/camera_calibration.ipynb). Such image analysis can utilise the parallelisation power of GPU to accelerate the research.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

link removed


One high level analysis package [py-clesperanto](https://github.com/clEsperanto/pyclesperanto_prototype) attempts GPU acceleration based on OpenCL. Such computing process allows bioimage analysis not bound to graphic processing, but to more generic calculations. From this the GPU is often referred as general purpose GPU (GPGPU). Vendors like AMD and Intel are alternatives to NVidia in this sense.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about mentioning AMD and Intel GPUs in a table above?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added comparision between GPU vendors


## Neural Processing Unit (NPU)
<div style="text-align: center;">
<img src="./npu_2.png" alt="Placeholder Image" style="width:50%;">
<p><em>Schematic depiction of the outter matrix product AB of two matrices A and B. NPUs implement GEMMs by partitioning the output matrix into tiles, which are then parallel loaded from memory buffer, multiplied and accumulated into output. </em></p>
</div>

A Neural Processing Unit (NPU) is a specialized hardware accelerator designed to efficiently handle the computational demands of AI and machine learning tasks, particularly neural network inference and training. NPUs are optimized for the types of operations commonly used in deep learning, such as matrix multiplications, convolutions, and activation functions. In mid-2024 the NPUs are embedded in various SoCs, allowing a wider choice in AI applications.

| Feature | Google TPU (USB/M.2) | Apple Silicon | AMD | Intel (after Meteor Lake) | NVIDIA (Grace Hopper) | NVIDIA (Jetson) | Snapdragon Xlite |
|----------------------------------|---------------------------|--------------------|--------------------------|---------------------------|---------------------------|---------------------------|---------------------------|
| **Product Name** | Edge TPU | Apple Neural Engine| 3rd Gen Ryzen AI| VPU, GNA, AI Engine | TensorRT, DLA, Grace Hopper| Jetson Xavier, Nano, TX2 | Qualcomm AI Engine |
| **Primary Use Case** | Edge AI, Low Power Devices| Mobile, Desktop | GPUs with AI Capabilities| Mobile, Desktop, Edge AI | Data Center, HPC, Embedded | Embedded AI | Mobile, Edge Computing |
| **Performance** | Moderate | High | Moderate to High | Moderate to High | Very High | Moderate to High | Moderate |
| **Efficiency** | High | High | Moderate | High | Moderate to High | High | High |
| **Special Features** | Google Cloud Compatible, Tensor Operations| Unified Memory, Tight OS Integration | APUs, ROCm | Low Power, Vision Processing, Integrated AI | CUDA Integration, Tensor Cores | Low Power, Integrated AI | Integrated 5G, AI on Device |
| **Flexibility** | Specialized for TensorFlow| General Purpose | AI with General Compute | Specialized for AI and Vision| Highly Specialized | General Purpose | General Purpose |
| **Compatibility** | TensorFlow Lite | macOS | Windows, Linux | Windows, Linux | Windows, Linux | Linux | Android, Windows |
| **Scalability** | High | Moderate | Moderate | Moderate | High | Moderate | Moderate |
| **Integration** | Edge Devices | Mobile, Desktop | GPUs | Mobile, Desktop, Edge Devices | HPC, Cloud, Embedded | Embedded Systems| Mobile SoCs |
| **Availability** | USB, M.2 Modules | Built-in (A-series, M-series)| Radeon Instinct GPUs | Integrated in Meteor Lake CPUs | Available in GPUs, Servers | Available in Embedded Modules | Snapdragon SoCs |
Binary file added docs/01_introduction/npu.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/01_introduction/npu_2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
1 change: 1 addition & 0 deletions docs/_toc.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ parts:
- caption: Basics
chapters:
- file: 01_introduction/trailer
- file: 01_introduction/hardware
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if the document might fit better in an advanced section, e.g. in the GPU-acceleration section? We should certainly not introduce hardware-aspects before anything else.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved to a separate hardware section

- file: 01_introduction/readme

- file: 02_python_basics/readme
Expand Down