We were configuring for hbase multiple nics and set both hbase.regionserver.ipc.address and hbase.master.ipc.address to 0.0.0.0. This caused hostname lookup issues which was a known issue years ago (we now know).
We’ve backed this out but ended up with 70 regions in a state of OFFLINE and still assigned to a Dead Region Servers with hostname 0:0:0:0:0:0:0:0
Restarting the master should transition this but it doesn’t like the region name. Seems to want to split it during the transition:
2024-04-24 11:02:26,475 ERROR [MASTER_SERVER_OPERATIONS-master:60000-1] executor.EventHandler: Caught throwable while processing event M_SERVER_SHUTDOWN
java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: **0:0:0:0:0:0:0:0,60020,1713708572030-splitting**
at org.apache.hadoop.fs.Path.initialize(Path.java:206)
at org.apache.hadoop.fs.Path.<init>(Path.java:172)
at org.apache.hadoop.fs.Path.<init>(Path.java:94)
at org.apache.hadoop.fs.Path.suffix(Path.java:354)
at org.apache.hadoop.hbase.master.MasterFileSystem.getLogDirs(MasterFileSystem.java:315)
at org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:405)
at org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:383)
at org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:281)
at org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:196)
at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:128)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.net.URISyntaxException: Relative path in absolute URI: 0:0:0:0:0:0:0:0,60020,1713708572030-splitting
at java.net.URI.checkPath(URI.java:1823)
at java.net.URI.<init>(URI.java:745)
at org.apache.hadoop.fs.Path.initialize(Path.java:203)
I can verify that hadoop does not like the name:
/home/gs/Dev/hadoop/bin/hdfs dfs -mkdir /hbase/WALs_mike/0:0:0:0:0:0:0:0,60020,1713708572030-splitting
Returns: /hbase/WALs_mike/0:0:0:0:0:0:0:0,60020,1713708572030-splitting is not a valid DFS filename
It’s an older version:0.96.1.1
I think the main problem is the code can’t handle the bad host name: 0:0:0:0:0:0:0:0.
Anyway to disable the splitting during M_SERVER_SHUTDOWN?
Anyway to rename the host associated with those regions?
Had no luck with hbck and shell assign, move, unassign:
2024-04-24 08:15:19,492 INFO [RpcServer.handler=20,port=60000] master.AssignmentManager: Skip assigning c,x92I$x92I$x92I$x92I$x92I$x90,1330810355551.d31a033cd7810e347639e12833969754., it's host 0:0:0:0:0:0:0:0,60020,1713708572030 is dead but not processed yet