Thiết kế website giá rẻ

Question

I am trying to implement this repo.
I am using the Kvasir-SEG dataset.
Error is occuring at model.fit line (81).

This link is close to my problem, but none of the answers worked. So I included more of the error logs to help people provide me with an answer.

When running the python3 run.py, it runs sometimes randomly with no problem but in most cases I get the following error:

Epoch 1/200
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
I0000 00:00:1726636494.249767   16271 service.cc:146] XLA service 0x7f1f90001ce0 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
I0000 00:00:1726636494.249802   16271 service.cc:154]   StreamExecutor device (0): NVIDIA GeForce RTX 3070 Laptop GPU, Compute Capability 8.6
2024-09-18 10:44:54.852933: I tensorflow/compiler/mlir/tensorflow/utils/dump_mlir_util.cc:268] disabling MLIR crash reproducer, set env var `MLIR_CRASH_REPRODUCER_DIRECTORY` to enable.
2024-09-18 10:44:56.370033: I external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:531] Loaded cuDNN version 8907
2024-09-18 10:44:59.966977: E external/local_xla/xla/service/gpu/buffer_comparator.cc:153] Difference at 648122: 570.72, expected 698.733
2024-09-18 10:44:59.967805: E external/local_xla/xla/service/gpu/conv_algorithm_picker.cc:697] Results mismatch between different convolution algorithms. This is likely a bug/unexpected loss of precision in cudnn.
(f32[4,256,32,32]{3,2,1,0}, u8[0]{0}) custom-call(f32[4,256,32,32]{3,2,1,0}, f32[256,256,3,3]{3,2,1,0}, f32[256]{0}), window={size=3x3 pad=1_1x1_1}, dim_labels=bf01_oi01->bf01, custom_call_target="__cudnn$convBiasActivationForward", backend_config={"operation_queue_id":"0","wait_on_operation_queues":[],"cudnn_conv_backend_config":{"conv_result_scale":1,"activation_mode":"kNone","side_input_scale":0,"leakyrelu_alpha":0},"force_earliest_schedule":false} for eng26{k2=0,k13=2,k14=3,k18=0,k22=0,k23=0} vs eng11{k2=0,k3=0}
2024-09-18 10:44:59.967826: E external/local_xla/xla/service/gpu/conv_algorithm_picker.cc:312] Device: NVIDIA GeForce RTX 3070 Laptop GPU
2024-09-18 10:44:59.967832: E external/local_xla/xla/service/gpu/conv_algorithm_picker.cc:313] Platform: Compute Capability 8.6
2024-09-18 10:44:59.967837: E external/local_xla/xla/service/gpu/conv_algorithm_picker.cc:314] Driver: 12060 (INVALID_ARGUMENT: expected %d.%d, %d.%d.%d, or %d.%d.%d.%d form for driver version; got "1")
2024-09-18 10:44:59.967842: E external/local_xla/xla/service/gpu/conv_algorithm_picker.cc:315] Runtime: <undefined>
2024-09-18 10:44:59.967850: E external/local_xla/xla/service/gpu/conv_algorithm_picker.cc:320] cudnn version: 8.9.7
2024-09-18 10:45:00.530038: E external/local_xla/xla/stream_executor/cuda/cuda_driver.cc:1857] failed to synchronize the stop event: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered
E0000 00:00:1726636500.530095   16271 gpu_timer.cc:156] INTERNAL: Could not synchronize CUDA stream: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered
E0000 00:00:1726636500.530124   16271 gpu_timer.cc:162] INTERNAL: Error destroying CUDA event: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered
E0000 00:00:1726636500.530136   16271 gpu_timer.cc:168] INTERNAL: Error destroying CUDA event: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered
2024-09-18 10:45:00.530143: E external/local_xla/xla/stream_executor/cuda/cuda_driver.cc:1652] error deallocating host memory at 0x205200200: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered
2024-09-18 10:45:00.569431: E external/local_xla/xla/stream_executor/cuda/cuda_driver.cc:1886] could not synchronize on CUDA context: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered ::

Failed to determine best cudnn convolution algorithm for:
%cudnn-conv-bias-activation.169 = (f32[4,64,256,256]{3,2,1,0}, u8[0]{0}) custom-call(f32[4,16,256,256]{3,2,1,0} %maximum.63, f32[64,16,3,3]{3,2,1,0} %transpose.577, f32[64]{0} %arg189.190), window={size=3x3 pad=1_1x1_1}, dim_labels=bf01_oi01->bf01, custom_call_target="__cudnn$convBiasActivationForward", metadata={op_type="Conv2D" op_name="functional_1/conv2d_29_1/convolution" source_file="/home/suhas/ResUNetPlusPlus/.venv/lib/python3.10/site-packages/tensorflow/python/framework/ops.py" source_line=1177}, backend_config={"operation_queue_id":"0","wait_on_operation_queues":[],"cudnn_conv_backend_config":{"conv_result_scale":1,"activation_mode":"kNone","side_input_scale":0,"leakyrelu_alpha":0},"force_earliest_schedule":false}

Original error: INTERNAL: Failed to synchronize GPU for autotuning conv instruction

To ignore this failure and try to use a fallback algorithm (which may have suboptimal performance), use XLA_FLAGS=--xla_gpu_strict_conv_algorithm_picker=false.  Please also file a bug for the root cause of failing autotuning.
         [[{{node StatefulPartitionedCall}}]] [Op:__inference_one_step_on_iterator_45815]
2024-09-18 10:45:00.913684: W tensorflow/core/kernels/data/generator_dataset_op.cc:108] Error occurred when finalizing GeneratorDataset iterator: FAILED_PRECONDITION: Python interpreter state is not initialized. The process may be terminated.
         [[{{node PyFunc}}]]

nvidia-smi for my 3070 laptop GPU:

Fri Sep 20 10:15:02 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 560.35.03              Driver Version: 561.09         CUDA Version: 12.6     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 3070 ...    On  |   00000000:01:00.0 Off |                  N/A |
| N/A   64C    P0             33W /  130W |       4MiB /   8192MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+

I have tried restarting my pc multiple times which allows me to run this file sometimes.

Also tried rm -rf ~/.nv/ to clear cache.

My setup:
CUDA Toolkit 12.3
CuDNN 8.9.7
Python 3.10.12
Tensorflow 2.17.0
Running in WSL2
Installed Tensorflow using python3 -m pip install tensorflow[and-cuda]

Thiết kế website giá rẻ

Danh mục

Failed to synchronize GPU for autotuning conv instruction while using Tensorflow