Cnsodb panics during reading and writing.
deploy: 3meta+2querytskv
version:cnosdb 2.4.1, revision 9b25565a6c8ed5a12726475c0be6cb099ef980b2
I deployed a distributed environment here, and then performed write and query operations through load_cnosdb and iotbench respectively. After two days, I found that a querytskv node exited.
I checked the latest log and found that it panicked. The panic log I want to write is as follows
2024-06-28T09:00:43.646554474Z INFO coordinator::raft::manager: start raft node: RaftNodeSummary { tenant: "cnosdb", db_name: "test", group_id: 44, raft_id: 45 } Success
2024-06-28T09:00:56.188081189Z ERROR tskv::record_file::reader: Record file: Failed to read data: file_size: 687783936, Data size (469) at pos 687783756 is greater than bytes readed (180)
2024-06-28T09:00:56.240954635Z ERROR tskv::wal::wal_store: Error reading wal: RecordFileInvalidDataSize { pos: 687783742, len: 469, location: Location { file: "/home/runner/work/cnosdb/cnosdb/tskv/src/record_file/reader.rs", line: 249, column: 22 }, backtrace: Backtrace( 0: tskv::record_file::reader::Reader::read_record::{{closure}}
1: tskv::wal::wal_store::RaftEntryStorage::recover::{{closure}}
2: coordinator::raft::manager::RaftNodesManager::open_vnode_storage::{{closure}}
3: coordinator::raft::manager::RaftNodesManager::open_raft_node::{{closure}}
4: coordinator::raft::manager::RaftNodesManager::start_all_raft_node::{{closure}}::{{closure}}
5: tokio::runtime::task::raw::poll
6: tokio::runtime::scheduler::multi_thread::worker::Context::run_task
7: tokio::runtime::scheduler::multi_thread::worker::run
8: tokio::runtime::task::raw::poll
9: std::sys_common::backtrace::__rust_begin_short_backtrace
10: core::ops::function::FnOnce::call_once{{vtable.shim}}
11: <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once
at rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/alloc/src/boxed.rs:2015:9
<alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once
at rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/alloc/src/boxed.rs:2015:9
std::sys::unix::thread::Thread::new::thread_start
at rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/std/src/sys/unix/thread.rs:108:17
12: <unknown>
13: __clone
) }
thread 'main' panicked at /home/runner/work/cnosdb/cnosdb/coordinator/src/service.rs:166:14:
called `Result::unwrap()` on an `Err` value: TskvError { source: WalTruncated { location: Location { file: "/home/runner/work/cnosdb/cnosdb/tskv/src/wal/wal_store.rs", line: 427, column: 54 }, backtrace: Backtrace( 0: tskv::wal::wal_store::RaftEntryStorage::recover::{{closure}}
1: coordinator::raft::manager::RaftNodesManager::open_vnode_storage::{{closure}}
2: coordinator::raft::manager::RaftNodesManager::open_raft_node::{{closure}}
3: coordinator::raft::manager::RaftNodesManager::start_all_raft_node::{{closure}}::{{closure}}
4: tokio::runtime::task::raw::poll
5: tokio::runtime::scheduler::multi_thread::worker::Context::run_task
6: tokio::runtime::scheduler::multi_thread::worker::run
7: tokio::runtime::task::raw::poll
8: std::sys_common::backtrace::__rust_begin_short_backtrace
9: core::ops::function::FnOnce::call_once{{vtable.shim}}
10: <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once
at rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/alloc/src/boxed.rs:2015:9
<alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once
at rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/alloc/src/boxed.rs:2015:9
std::sys::unix::thread::Thread::new::thread_start
at rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/std/src/sys/unix/thread.rs:108:17
11: <unknown>
12: __clone
) } }
stack backtrace:
0: rust_begin_unwind
at ./rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/std/src/panicking.rs:645:5
1: core::panicking::panic_fmt
at ./rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/core/src/panicking.rs:72:14
2: core::result::unwrap_failed
at ./rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/core/src/result.rs:1649:5
3: coordinator::service::CoordService::new::{{closure}}
4: cnosdb::server::ServiceBuilder::create_coord::{{closure}}
5: cnosdb::server::ServiceBuilder::build_query_storage::{{closure}}
6: cnosdb::main::{{closure}}
7: cnosdb::main
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
Has anyone encountered such a problem? How did this happen?