Common OS issues
The following are common OS misconfiguration issues for Node and Gateway systems:
- DNS
- NVIDIA CUDA driver (Node only)
- Firewall
- Python modules
- TLS certificates
These are the frequently seen ones but there may be other, external issues such as high network latency between your Node and cMixx Permissioning service which may prevent the Node from resuming cMixx service for hours.
DNS
These include misconfigured DNS (where ping NODE_FQDN
does not resolve to the correct IP address) or Dynamic DNS (where either the Node or the Gateway do not have a static IP and Dynamic DNS is not creating or updating DNS configuration).
Refer to generic Ubuntu documentation and Dynamic DNS documentation of the solution you use.
CUDA
If your Node uses GPU to assist with cMixx computation, CUDA needs to work. The easiest way to check is nvidia-smi
:
$ sudo nvidia-smi
Sat Nov 16 15:39:06 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.120 Driver Version: 550.120 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
The above shows driver version 550.120 is installed and used. If there was a mismatch, the command would not work.
When/if CUDA driver gets upgraded, nvidia-smi
may no longer work until the Node is rebooted.
cMixx log may show errors similar to this:
FATAL 2024/11/15 22:22:22 Couldn't initialize GPU. Error: CU error occurred: CUDA_ERROR_COMPAT_NOT_SUPPORTED_ON_DEVICE
Reboot for the new driver to take effect, or downgrade to the version you had before and reboot to restore it.
Firewall (UFW)
Refer to Node or Gateway (as each has slightly different firewall rules) configuration instructions and use UFW Cheat Sheet to verify your configuration steps are correct.
Python modules
This may occur on both new and old Node or Gateway systems. Symptoms involve various Python-related errors in service scripts.
On new nodes, simply repeat Python-related configuration steps for Node or Gateway.
On old nodes that have been upgraded (e.g 22.04 LTS to 24.04 LTS), re-install Python modules from "Software dependencies" pages.
TLS ceritificates
Expired TLS certificates may stop a working Node or Gateway.
You may check your TLS certificate exiration on your own (using openssl or other CLI), but the Team usually posts them to the Community Forum and Discord (the announcements channel).
Other issues
As mentioned at the top, external or environment issues are out of scope here, but if you suspect network issues are preventing your Node from mixing you may try this workflow based on YABS, which is a wrapper for several popular utilities and tools. In short, install YABS, run it like so and compare your output with results at the link.
curl -sL yabs.sh | bash -s -- -gf
This can of course also be useful on Gateway nodes, but they are less affected by network latency. A Node-Gateway with sufficient, but low performance may have more cMixx failures, but it will mix.
A Node with high latency may be unable to start mixing in the first place.