Experiencing kernel panic with the error message “PANIC: zfs: adding existent segment to range tree” when attempting to import the ZFS pool (rpool). The system hangs upon import attempts, making it difficult to diagnose and resolve the issue.
Steps Taken:
Initial Diagnostics and Attempted Fixes:
- Verified disk health using smartctl.
- Attempted to import the pool in read-only and degraded modes.
- Tried force importing the pool with different cache options and alternate root.
- Used zdb for detailed diagnostics but encountered corruption errors.
Detailed Troubleshooting Steps:
Load ZFS Modules:
$sudo modprobe zfs
Check Disk Visibility:
$sudo fdisk -l
Disk /dev/sda: 1.82 TiB, 2000398934016 bytes, 3907029168 sectors Disk
model: ST2000DM006-2DM1 Units: sectors of 1 * 512 = 512 bytes Sector
size (logical/physical): 512 bytes / 4096 bytes I/O size
(minimum/optimal): 4096 bytes / 4096 bytes Disklabel type: dos Disk
identifier: 0x37c4bf6dDevice Boot Start End Sectors Size Id Type /dev/sda1
2048 3907028991 3907026944 1.8T 83 LinuxDisk /dev/nvme0n1: 1.82 TiB, 2000398934016 bytes, 3907029168 sectors
Disk model: Samsung SSD 970 EVO Plus 2TB Units: sectors of
1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512
bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disklabel
type: gpt Disk identifier: 2E73ECD2-33DF-4F29-8B8E-936FF595189BDevice Start End Sectors Size Type /dev/nvme0n1p1
2048 1050623 1048576 512M EFI System /dev/nvme0n1p3 1050624
5244927 4194304 2G Solaris boot /dev/nvme0n1p4 5244928
3907029134 3901784207 1.8T Solaris rootDisk /dev/nvme1n1: 1.82 TiB, 2000398934016 bytes, 3907029168 sectors
Disk model: Samsung SSD 970 EVO Plus 2TB Units: sectors of
1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512
bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disklabel
type: gpt Disk identifier: 4DC3FC19-2023-4AD1-AA85-F36AD5ABA686Device Start End Sectors Size Type /dev/nvme1n1p1
2048 1050623 1048576 512M EFI System /dev/nvme1n1p3 1050624
5244927 4194304 2G Solaris boot /dev/nvme1n1p4 5244928
3907029134 3901784207 1.8T Solaris root
List Available Pools:
$sudo zpool import
pool: rpool
id: 3888825418065318545 state: ONLINE status: The pool was last accessed by another system. action: The pool can be imported using
its name or numeric identifier and the ‘-f’ flag. see:
https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-EY config:rpool ONLINE
nvme-Samsung_SSD_970_EVO_Plus_2TB_S4J4NG0M709272H-part4 ONLINE
nvme-Samsung_SSD_970_EVO_Plus_2TB_S4J4NG0M709378T-part4 ONLINEpool: bpool
id: 5219330193861745018 state: ONLINE status: The pool was last accessed by another system. action: The pool can be imported using
its name or numeric identifier and the ‘-f’ flag. see:
https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-EY config:bpool ONLINE
nvme-Samsung_SSD_970_EVO_Plus_2TB_S4J4NG0M709272H-part3 ONLINE
nvme-Samsung_SSD_970_EVO_Plus_2TB_S4J4NG0M709378T-part3 ONLINE
Force Import Pools:
$sudo zpool import -f -F -X rpool
-> Hangs with PANIC: zfs: adding existent segment to range tree
Use zdb to Examine Disks:
$sudo zdb -l /dev/nvme0n1
failed to unpack label 0 failed to unpack label 1
———————————— LABEL 2 (Bad label cksum)version: 5000 name: 'rpool' state: 0 txg: 219966048 pool_guid: 3888825418065318545 errata: 0 hostid: 1585004891 hostname: 'zenith' top_guid: 8732219980519990883 guid: 8732219980519990883 vdev_children: 2 vdev_tree: type: 'disk' id: 0 guid: 8732219980519990883 path: '/dev/disk/by-id/nvme-Samsung_SSD_970_EVO_Plus_2TB_S4J4NG0M709272H-part4' whole_disk: 0 metaslab_array: 265 metaslab_shift: 34 ashift: 12 asize: 1997708722176 is_log: 0 DTL: 18443 create_txg: 4 features_for_read: com.delphix:hole_birth com.delphix:embedded_data labels = 2 3
$sudo zdb -l /dev/nvme1n1
failed to unpack label 0 failed to unpack label 1
———————————— LABEL 2 (Bad label cksum)version: 5000 name: 'rpool' state: 0 txg: 219966048 pool_guid: 3888825418065318545 errata: 0 hostid: 1585004891 hostname: 'zenith' top_guid: 11486562238434614080 guid: 11486562238434614080 vdev_children: 2 vdev_tree: type: 'disk' id: 1 guid: 11486562238434614080 path: '/dev/disk/by-id/nvme-Samsung_SSD_970_EVO_Plus_2TB_S4J4NG0M709378T-part4' whole_disk: 0 metaslab_array: 256 metaslab_shift: 34 ashift: 12 asize: 1997708722176 is_log: 0 DTL: 18444 create_txg: 4 features_for_read: com.delphix:hole_birth com.delphix:embedded_data labels = 2 3
Use smartctl to Check Disks health:
$sudo smartctl -a /dev/nvme0n1
$sudo smartctl -a /dev/nvme1n1 (log not added, giving very same result)
smartctl 7.4 2023-08-01 r5530
[x86_64-linux-6.8.0-31-generic] (local build) Copyright (C) 2002-23,
Bruce Allen, Christian Franke, www.smartmontools.org=== START OF INFORMATION SECTION === Model Number: Samsung SSD 970 EVO Plus 2TB Serial Number:
S4J4NG0M709272H Firmware Version: 2B2QEXM7 PCI
Vendor/Subsystem ID: 0x144d IEEE OUI Identifier:
0x002538 Total NVM Capacity: 2,000,398,934,016 [2.00
TB] Unallocated NVM Capacity: 0 Controller ID:
4 NVMe Version: 1.3 Number of Namespaces:
1 Namespace 1 Size/Capacity: 2,000,398,934,016 [2.00 TB]
Namespace 1 Utilization: 964,368,625,664 [964 GB] Namespace
1 Formatted LBA Size: 512 Namespace 1 IEEE EUI-64:
002538 57915026de Local Time is: Tue Jul 9
06:15:59 2024 UTC Firmware Updates (0x16): 3 Slots, no
Reset required Optional Admin Commands (0x0017): Security Format
Frmw_DL Self_Test Optional NVM Commands (0x005f): Comp Wr_Unc
DS_Mngmt Wr_Zero Sav/Sel_Feat Timestmp Log Page Attributes (0x03):
S/H_per_NS Cmd_Eff_Lg Maximum Data Transfer Size: 512 Pages
Warning Comp. Temp. Threshold: 85 Celsius Critical Comp. Temp.
Threshold: 85 CelsiusSupported Power States St Op Max Active Idle RL RT WL WT
Ent_Lat Ex_Lat 0 + 7.50W – – 0 0 0 0 0
0 1 + 5.90W – – 1 1 1 1 0 0 2 +
3.60W – – 2 2 2 2 0 0 3 – 0.0700W – – 3 3 3 3 210 1200 4 – 0.0050W – – 4 4 4 4 2000 8000Supported LBA Sizes (NSID 0x1) Id Fmt Data Metadt Rel_Perf 0 +
512 0 0=== START OF SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED
SMART/Health Information (NVMe Log 0x02) Critical Warning:
0x00 Temperature: 41 Celsius Available Spare:
100% Available Spare Threshold: 10% Percentage Used:
4% Data Units Read: 53,033,125 [27.1 TB] Data Units
Written: 518,776,150 [265 TB] Host Read Commands:
754,909,551 Host Write Commands: 10,619,920,206
Controller Busy Time: 28,529 Power Cycles:
114 Power On Hours: 23,139 Unsafe Shutdowns:
86 Media and Data Integrity Errors: 0 Error Information Log
Entries: 266 Warning Comp. Temperature Time: 0 Critical Comp.
Temperature Time: 0 Temperature Sensor 1: 41 Celsius
Temperature Sensor 2: 39 CelsiusError Information (NVMe Log 0x01, 16 of 64 entries) Num ErrCount
SQId CmdId Status PELoc LBA NSID VS Message 0
266 0 0x0000 0x4004 – 0 0 – Invalid
Field in CommandSelf-test Log (NVMe Log 0x06) Self-test status: No self-test in
progress Num Test_Description Status
Power_on_Hours Failing_LBA NSID Seg SCT Code 0 Short
Completed without error 22551 – – – –
Handling Pool Import Failures:
Attempted to import the pool using specific device paths but encountered kernel panics on both NVMe devices.
Tried various import options:
sudo zpool import -d /dev/nvme0n1p4 -R /mnt -f rpool
sudo zpool import -d /dev/nvme1n1p4 -R /mnt -f rpool
Advanced Recovery Attempts:
Updated ZFS tools and kernel modules:
sudo apt-get update
sudo apt-get install –only-upgrade zfsutils-linux zfs-dkms
Used zdb for detailed diagnostics:
sudo zdb -e -bcsvL rpool
-> Hangs with PANIC: zfs: adding existent segment to range tree
Attempted import with recovery mode:
sudo modprobe zfs zfs_recover=1
sudo zpool import -FX rpool
-> Hangs with PANIC: zfs: adding existent segment to range tree
Current Status:
Kernel panic persists on import attempts, indicating severe corruption within the ZFS pool.
Considering professional data recovery services or further community advice.
I am seeking advice on any additional steps or strategies to recover the ZFS pool or further isolate the corruption. Any insights from similar experiences or advanced troubleshooting techniques would be greatly appreciated.
Currently I am not sure where the problem is at all, my last resort option is to drop that rpool and try to rebuild it from a scratch.