Joonas' Note
Joonas' Note
nvidia-smi 명령어 정리 본문
GPU 전체 상태 보기
nvidia-smi
특정 GPU 상태 보기
숫자는 gpu id (UUID 또는 PIC bus ID) 를 입력하면 된다.
nvidia-smi -i 0
여러 개를 한 번에 출력할 수 도 있다.
$ nvidia-smi -i 0,3
Mon Nov 3 14:02:05 2025
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.141.10 Driver Version: 470.141.10 CUDA Version: 11.4 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla V100-DGXS... On | 00000000:07:00.0 On | 0 |
| N/A 48C P0 41W / 300W | 218MiB / 16155MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 3 Tesla V100-DGXS... On | 00000000:0F:00.0 Off | 0 |
| N/A 64C P0 43W / 300W | 10MiB / 16158MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 3174 G /usr/lib/xorg/Xorg 102MiB |
| 0 N/A N/A 4054 G /usr/lib/xorg/Xorg 67MiB |
| 0 N/A N/A 4253 G /usr/bin/gnome-shell 26MiB |
| 3 N/A N/A 3174 G /usr/lib/xorg/Xorg 4MiB |
| 3 N/A N/A 4054 G /usr/lib/xorg/Xorg 4MiB |
+-----------------------------------------------------------------------------+
GPU 전체 리스트로 출력
nvidia-smi -L
GPU 0: Tesla V100-DGXS-16GB (UUID: GPU-4a5696d4-12ff-d8a3-f604-d25020c46dc9)
GPU 1: Tesla V100-DGXS-16GB (UUID: GPU-036e8961-7968-e8ed-05db-3ed1117387ab)
Unable to determine the device handle for gpu 0000:0E:00.0: Unknown Error
GPU 3: Tesla V100-DGXS-16GB (UUID: GPU-4ce0c45f-8be0-d20f-db73-e6e0e254b51f)
과부하 걸렸거나 온도가 높아서 죽어버린 GPU 도 위처럼 볼 수 있음
GPU 최대 전력 제한
$ nvidia-smi -pm 1
$ nvidia-smi -pl 220
Power limit for GPU 00000000:07:00.0 was set to 220.00 W from 300.00 W.
Power limit for GPU 00000000:08:00.0 was set to 220.00 W from 300.00 W.
Power limit for GPU 00000000:0E:00.0 was set to 220.00 W from 300.00 W.
Power limit for GPU 00000000:0F:00.0 was set to 220.00 W from 300.00 W.
All done.
옵션 전체 보기 (공식 문서)
https://docs.nvidia.com/deploy/nvidia-smi/index.html
https://docs.nvidia.com/deploy/nvidia-smi/index.html
Operating state of the PSU. The power supply state can be any of the following: "Normal", "Abnormal", "High voltage", "Fan failure", "Heatsink temperature", "Current limit", "Voltage below UV alarm threshold", "Low-voltage", "I2C remote off command", "MOD_
docs.nvidia.com
'개발' 카테고리의 다른 글
| git hook 설정할 때 scp connection 오류 해결법 (0) | 2024.09.12 |
|---|---|
| git push 할 때 TLS certificate verification 생략하기 (0) | 2024.09.04 |
| [Ubuntu] 디스크 용량이 남았는데 No space left 오류인 경우 (0) | 2024.08.10 |
| git network connection 오류 추적하기 (0) | 2024.06.20 |
| Quick, Draw! 클론 코딩 해보기 (0) | 2024.05.26 |
Comments