I am using flink , read kafka and write kafka, it’s simple job. but i face some quetions.
1、flink checkpoint always in-progress state.
2、the job logs don’t have any useful message.
- job running state
-
jm log
2024-06-03 19:35:01[flink-akka.actor.default-dispatcher-6][org.apache.flink.runtime.executiongraph.ExecutionGraph][INFO] - Job [默认机构]-[mengh-test]-[xh_kafka2kafka]-[2] (50a99ba35e78fc2dada1595a64c1095d) switched from state CREATED to RUNNING. 2024-06-03 19:35:01[flink-akka.actor.default-dispatcher-6][org.apache.flink.runtime.executiongraph.ExecutionGraph][INFO] - Source: Kafka采集_1 (1/1) (67303c1e5204a0e0ca67159cd535d445_32327f23cd3a7e9293b6f6e514156c67_0_0) switched from CREATED to SCHEDULED. 2024-06-03 19:35:01[flink-akka.actor.default-dispatcher-6][org.apache.flink.runtime.executiongraph.ExecutionGraph][INFO] - Sink: Kafka输出_2 (1/1) (67303c1e5204a0e0ca67159cd535d445_2534dea4e3192b38956d113863687316_0_0) switched from CREATED to SCHEDULED. 2024-06-03 19:35:01[flink-akka.actor.default-dispatcher-6][org.apache.flink.runtime.jobmaster.JobMaster][INFO] - Connecting to ResourceManager akka.tcp://flink@xh2:36541/user/rpc/resourcemanager_*(00000000000000000000000000000000) 2024-06-03 19:35:02[flink-akka.actor.default-dispatcher-19][org.apache.hadoop.hdfs.DFSClient][INFO] - Created HDFS_DELEGATION_TOKEN token 71 for paUser on ha-hdfs:nameservice1 2024-06-03 19:35:02[flink-akka.actor.default-dispatcher-19][org.apache.hadoop.hdfs.DFSClient][INFO] - Created HDFS_DELEGATION_TOKEN token 72 for paUser on ha-hdfs:nameservice1 2024-06-03 19:35:02[flink-akka.actor.default-dispatcher-19][org.apache.flink.runtime.security.token.KerberosDelegationTokenManager][INFO] - Tokens update task started with 64799860 ms delay 2024-06-03 19:35:02[flink-akka.actor.default-dispatcher-19][org.apache.hadoop.ipc.Client][WARN] - Failed to connect to server: xh3/10.100.1.167:8030: retries get failed due to exceeded maximum allowed retries number: 0 java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) ~[?:1.8.0_25] at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:716) ~[?:1.8.0_25] at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206) ~[flink-shaded-hadoop-2-uber-2.8.3-9.0.jar:2.8.3-9.0] at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531) ~[flink-shaded-hadoop-2-uber-2.8.3-9.0.jar:2.8.3-9.0] at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:685) [flink-shaded-hadoop-2-uber-2.8.3-9.0.jar:2.8.3-9.0] at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:788) [flink-shaded-hadoop-2-uber-2.8.3-9.0.jar:2.8.3-9.0] at org.apache.hadoop.ipc.Client$Connection.access$3500(Client.java:410) [flink-shaded-hadoop-2-uber-2.8.3-9.0.jar:2.8.3-9.0] at org.apache.hadoop.ipc.Client.getConnection(Client.java:1550) [flink-shaded-hadoop-2-uber-2.8.3-9.0.jar:2.8.3-9.0] at org.apache.hadoop.ipc.Client.call(Client.java:1381) [flink-shaded-hadoop-2-uber-2.8.3-9.0.jar:2.8.3-9.0] at org.apache.hadoop.ipc.Client.call(Client.java:1345) [flink-shaded-hadoop-2-uber-2.8.3-9.0.jar:2.8.3-9.0] at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:227) [flink-shaded-hadoop-2-uber-2.8.3-9.0.jar:2.8.3-9.0] at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116) [flink-shaded-hadoop-2-uber-2.8.3-9.0.jar:2.8.3-9.0] at com.sun.proxy.$Proxy40.registerApplicationMaster(Unknown Source) [?:?] at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.registerApplicationMaster(ApplicationMasterProtocolPBClientImpl.java:106) [flink-shaded-hadoop-2-uber-2.8.3-9.0.jar:2.8.3-9.0] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_25] at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_25] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_25] at java.lang.reflect.Method.invoke(Method.java:483) ~[?:1.8.0_25] at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:409) [flink-shaded-hadoop-2-uber-2.8.3-9.0.jar:2.8.3-9.0] at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:163) [flink-shaded-hadoop-2-uber-2.8.3-9.0.jar:2.8.3-9.0] at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:155) [flink-shaded-hadoop-2-uber-2.8.3-9.0.jar:2.8.3-9.0] at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95) [flink-shaded-hadoop-2-uber-2.8.3-9.0.jar:2.8.3-9.0] at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:346) [flink-shaded-hadoop-2-uber-2.8.3-9.0.jar:2.8.3-9.0] at com.sun.proxy.$Proxy41.registerApplicationMaster(Unknown Source) [?:?] at org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.registerApplicationMaster(AMRMClientImpl.java:236) [flink-shaded-hadoop-2-uber-2.8.3-9.0.jar:2.8.3-9.0] at org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.registerApplicationMaster(AMRMClientImpl.java:228) [flink-shaded-hadoop-2-uber-2.8.3-9.0.jar:2.8.3-9.0] at org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl.registerApplicationMaster(AMRMClientAsyncImpl.java:168) [flink-shaded-hadoop-2-uber-2.8.3-9.0.jar:2.8.3-9.0] at org.apache.flink.yarn.YarnResourceManagerDriver.registerApplicationMaster(YarnResourceManagerDriver.java:525) [flink-dist-1.16.1.jar:1.16.1] at org.apache.flink.yarn.YarnResourceManagerDriver.initializeInternal(YarnResourceManagerDriver.java:186) [flink-dist-1.16.1.jar:1.16.1] at org.apache.flink.runtime.resourcemanager.active.AbstractResourceManagerDriver.initialize(AbstractResourceManagerDriver.java:92) [flink-dist-1.16.1.jar:1.16.1] at org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager.initialize(ActiveResourceManager.java:171) [flink-dist-1.16.1.jar:1.16.1] at org.apache.flink.runtime.resourcemanager.ResourceManager.startResourceManagerServices(ResourceManager.java:269) [flink-dist-1.16.1.jar:1.16.1] at org.apache.flink.runtime.resourcemanager.ResourceManager.onStart(ResourceManager.java:241) [flink-dist-1.16.1.jar:1.16.1] at org.apache.flink.runtime.rpc.RpcEndpoint.internalCallOnStart(RpcEndpoint.java:198) [flink-dist-1.16.1.jar:1.16.1] at org.apache.flink.runtime.rpc.akka.AkkaRpcActor$StoppedState.lambda$start$0(AkkaRpcActor.java:622) [flink-rpc-akka_b38f95be-66ba-4627-bef8-8e33e48bf6cf.jar:1.16.1] at org.apache.flink.runtime.rpc.akka.AkkaRpcActor$StoppedState$$Lambda$448/1738199324.run(Unknown Source) [flink-rpc-akka_b38f95be-66ba-4627-bef8-8e33e48bf6cf.jar:1.16.1] at org.apache.flink.runtime.concurrent.akka.ClassLoadingUtils.runWithContextClassLoader(ClassLoadingUtils.java:68) [flink-rpc-akka_b38f95be-66ba-4627-bef8-8e33e48bf6cf.jar:1.16.1] at org.apache.flink.runtime.rpc.akka.AkkaRpcActor$StoppedState.start(AkkaRpcActor.java:621) [flink-rpc-akka_b38f95be-66ba-4627-bef8-8e33e48bf6cf.jar:1.16.1] at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleControlMessage(AkkaRpcActor.java:190) [flink-rpc-akka_b38f95be-66ba-4627-bef8-8e33e48bf6cf.jar:1.16.1] at org.apache.flink.runtime.rpc.akka.AkkaRpcActor$$Lambda$445/1346710180.apply(Unknown Source) [flink-rpc-akka_b38f95be-66ba-4627-bef8-8e33e48bf6cf.jar:1.16.1] at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:24) [flink-rpc-akka_b38f95be-66ba-4627-bef8-8e33e48bf6cf.jar:1.16.1] at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:20) [flink-rpc-akka_b38f95be-66ba-4627-bef8-8e33e48bf6cf.jar:1.16.1] at scala.PartialFunction.applyOrElse(PartialFunction.scala:123) [flink-rpc-akka_b38f95be-66ba-4627-bef8-8e33e48bf6cf.jar:1.16.1] at scala.PartialFunction.applyOrElse$(PartialFunction.scala:122) [flink-rpc-akka_b38f95be-66ba-4627-bef8-8e33e48bf6cf.jar:1.16.1] at akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:20) [flink-rpc-akka_b38f95be-66ba-4627-bef8-8e33e48bf6cf.jar:1.16.1] at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) [flink-rpc-akka_b38f95be-66ba-4627-bef8-8e33e48bf6cf.jar:1.16.1] at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:172) [flink-rpc-akka_b38f95be-66ba-4627-bef8-8e33e48bf6cf.jar:1.16.1] at akka.actor.Actor.aroundReceive(Actor.scala:537) [flink-rpc-akka_b38f95be-66ba-4627-bef8-8e33e48bf6cf.jar:1.16.1] at akka.actor.Actor.aroundReceive$(Actor.scala:535) [flink-rpc-akka_b38f95be-66ba-4627-bef8-8e33e48bf6cf.jar:1.16.1] at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:220) [flink-rpc-akka_b38f95be-66ba-4627-bef8-8e33e48bf6cf.jar:1.16.1] at akka.actor.ActorCell.receiveMessage(ActorCell.scala:580) [flink-rpc-akka_b38f95be-66ba-4627-bef8-8e33e48bf6cf.jar:1.16.1] at akka.actor.ActorCell.invoke(ActorCell.scala:548) [flink-rpc-akka_b38f95be-66ba-4627-bef8-8e33e48bf6cf.jar:1.16.1] at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:270) [flink-rpc-akka_b38f95be-66ba-4627-bef8-8e33e48bf6cf.jar:1.16.1] at akka.dispatch.Mailbox.run(Mailbox.scala:231) [flink-rpc-akka_b38f95be-66ba-4627-bef8-8e33e48bf6cf.jar:1.16.1] at akka.dispatch.Mailbox.exec(Mailbox.scala:243) [flink-rpc-akka_b38f95be-66ba-4627-bef8-8e33e48bf6cf.jar:1.16.1] at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289) [?:1.8.0_25] at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:902) [?:1.8.0_25] at java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1689) [?:1.8.0_25] at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1644) [?:1.8.0_25] at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157) [?:1.8.0_25] 2024-06-03 19:35:02[flink-akka.actor.default-dispatcher-19][org.apache.hadoop.yarn.client.ConfiguredRMFailoverProxyProvider][INFO] - Failing over to rm2 2024-06-03 19:35:02[flink-akka.actor.default-dispatcher-19][org.apache.flink.yarn.YarnResourceManagerDriver][INFO] - Recovered 0 containers from previous attempts ([]). 2024-06-03 19:35:02[flink-akka.actor.default-dispatcher-19][org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager][INFO] - Recovered 0 workers from previous attempt. 2024-06-03 19:35:02[flink-akka.actor.default-dispatcher-19][org.apache.flink.runtime.externalresource.ExternalResourceUtils][INFO] - Enabled external resources: [] 2024-06-03 19:35:02[flink-akka.actor.default-dispatcher-19][org.apache.hadoop.yarn.client.api.async.impl.NMClientAsyncImpl][INFO] - Upper bound of the thread pool size is 500 2024-06-03 19:35:02[flink-akka.actor.default-dispatcher-19][org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy][INFO] - yarn.client.max-cached-nodemanagers-proxies : 0 2024-06-03 19:35:02[flink-akka.actor.default-dispatcher-6][org.apache.flink.runtime.jobmaster.JobMaster][INFO] - Resolved ResourceManager address, beginning registration 2024-06-03 19:35:02[flink-akka.actor.default-dispatcher-19][org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager][INFO] - Registering job manager [email protected]://flink@xh2:36541/user/rpc/jobmanager_1 for job 50a99ba35e78fc2dada1595a64c1095d. 2024-06-03 19:35:02[flink-akka.actor.default-dispatcher-19][org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager][INFO] - Registered job manager [email protected]://flink@xh2:36541/user/rpc/jobmanager_1 for job 50a99ba35e78fc2dada1595a64c1095d. 2024-06-03 19:35:02[flink-akka.actor.default-dispatcher-6][org.apache.flink.runtime.jobmaster.JobMaster][INFO] - JobManager successfully registered at ResourceManager, leader id: 00000000000000000000000000000000. 2024-06-03 19:35:02[flink-akka.actor.default-dispatcher-6][org.apache.flink.runtime.resourcemanager.slotmanager.DeclarativeSlotManager][INFO] - Received resource requirements from job 50a99ba35e78fc2dada1595a64c1095d: [ResourceRequirement{resourceProfile=ResourceProfile{UNKNOWN}, numberOfRequiredSlots=1}] 2024-06-03 19:35:02[flink-akka.actor.default-dispatcher-19][org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager][INFO] - Requesting new worker with resource spec WorkerResourceSpec {cpuCores=1.0, taskHeapSize=198.400mb (208037478 bytes), taskOffHeapSize=0 bytes, networkMemSize=64.000mb (67108864 bytes), managedMemSize=57.600mb (60397978 bytes), numSlots=1}, current pending count: 1. 2024-06-03 19:35:02[flink-akka.actor.default-dispatcher-19][org.apache.flink.yarn.YarnResourceManagerDriver][INFO] - Requesting new TaskExecutor container with resource TaskExecutorProcessSpec {cpuCores=1.0, frameworkHeapSize=128.000mb (134217728 bytes), frameworkOffHeapSize=128.000mb (134217728 bytes), taskHeapSize=198.400mb (208037478 bytes), taskOffHeapSize=0 bytes, networkMemSize=64.000mb (67108864 bytes), managedMemorySize=57.600mb (60397978 bytes), jvmMetaspaceSize=256.000mb (268435456 bytes), jvmOverheadSize=192.000mb (201326592 bytes), numSlots=1}, priority 1. 2024-06-03 19:35:07[Checkpoint Timer][org.apache.flink.runtime.checkpoint.CheckpointFailureManager][INFO] - Failed to trigger checkpoint for job 50a99ba35e78fc2dada1595a64c1095d since Checkpoint triggering task Source: Kafka采集_1 (1/1) of job 50a99ba35e78fc2dada1595a64c1095d is not being executed at the moment. Aborting checkpoint. Failure reason: Not all required tasks are currently running.. 2024-06-03 19:35:07[AMRM Heartbeater thread][org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl][INFO] - Received new token for : xh3:36051 2024-06-03 19:35:07[flink-akka.actor.default-dispatcher-19][org.apache.flink.yarn.YarnResourceManagerDriver][INFO] - Received 1 containers. 2024-06-03 19:35:07[flink-akka.actor.default-dispatcher-19][org.apache.flink.yarn.YarnResourceManagerDriver][INFO] - Received 1 containers with priority 1, 1 pending container requests. 2024-06-03 19:35:07[flink-akka.actor.default-dispatcher-19][org.apache.flink.yarn.YarnResourceManagerDriver][INFO] - Removing container request Capability[<memory:1024, vCores:1>]Priority[1]. 2024-06-03 19:35:07[flink-akka.actor.default-dispatcher-19][org.apache.flink.yarn.YarnResourceManagerDriver][INFO] - Accepted 1 requested containers, returned 0 excess containers, 0 pending container requests of resource <memory:1024, vCores:1>. 2024-06-03 19:35:07[cluster-io-thread-4][org.apache.flink.yarn.YarnResourceManagerDriver][INFO] - TaskExecutor container_e02_1716186778189_0029_01_000002(xh3:36051) will be started on xh3 with TaskExecutorProcessSpec {cpuCores=1.0, frameworkHeapSize=128.000mb (134217728 bytes), frameworkOffHeapSize=128.000mb (134217728 bytes), taskHeapSize=198.400mb (208037478 bytes), taskOffHeapSize=0 bytes, networkMemSize=64.000mb (67108864 bytes), managedMemorySize=57.600mb (60397978 bytes), jvmMetaspaceSize=256.000mb (268435456 bytes), jvmOverheadSize=192.000mb (201326592 bytes), numSlots=1}. 2024-06-03 19:35:07[cluster-io-thread-4][org.apache.flink.yarn.YarnResourceManagerDriver][INFO] - TM:Adding keytab hdfs://nameservice1/user/paUser/.flink/application_1716186778189_0029/paUser.keytab to the container local resource bucket 2024-06-03 19:35:08[cluster-io-thread-4][org.apache.flink.yarn.YarnResourceManagerDriver][INFO] - Creating container launch context for TaskManagers 2024-06-03 19:35:08[cluster-io-thread-4][org.apache.flink.yarn.YarnResourceManagerDriver][INFO] - Starting TaskManagers 2024-06-03 19:35:08[flink-akka.actor.default-dispatcher-19][org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager][INFO] - Requested worker container_e02_1716186778189_0029_01_000002(xh3:36051) with resource spec WorkerResourceSpec {cpuCores=1.0, taskHeapSize=198.400mb (208037478 bytes), taskOffHeapSize=0 bytes, networkMemSize=64.000mb (67108864 bytes), managedMemSize=57.600mb (60397978 bytes), numSlots=1}. 2024-06-03 19:35:08[org.apache.hadoop.yarn.client.api.async.impl.NMClientAsyncImpl #0][org.apache.hadoop.yarn.client.api.async.impl.NMClientAsyncImpl][INFO] - Processing Event EventType: START_CONTAINER for Container container_e02_1716186778189_0029_01_000002 2024-06-03 19:35:21[flink-akka.actor.default-dispatcher-24][org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager][INFO] - Registering TaskManager with ResourceID container_e02_1716186778189_0029_01_000002(xh3:36051) (akka.tcp://flink@xh3:45749/user/rpc/taskmanager_0) at ResourceManager 2024-06-03 19:35:21[flink-akka.actor.default-dispatcher-6][org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager][INFO] - Worker container_e02_1716186778189_0029_01_000002(xh3:36051) is registered. 2024-06-03 19:35:21[flink-akka.actor.default-dispatcher-6][org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager][INFO] - Worker container_e02_1716186778189_0029_01_000002(xh3:36051) with resource spec WorkerResourceSpec {cpuCores=1.0, taskHeapSize=198.400mb (208037478 bytes), taskOffHeapSize=0 bytes, networkMemSize=64.000mb (67108864 bytes), managedMemSize=57.600mb (60397978 bytes), numSlots=1} was requested in current attempt. Current pending count after registering: 0. 2024-06-03 19:35:24[flink-akka.actor.default-dispatcher-22][org.apache.flink.runtime.executiongraph.ExecutionGraph][INFO] - Source: Kafka采集_1 (1/1) (67303c1e5204a0e0ca67159cd535d445_32327f23cd3a7e9293b6f6e514156c67_0_0) switched from SCHEDULED to DEPLOYING. 2024-06-03 19:35:24[flink-akka.actor.default-dispatcher-22][org.apache.flink.runtime.executiongraph.ExecutionGraph][INFO] - Deploying Source: Kafka采集_1 (1/1) (attempt #0) with attempt id 67303c1e5204a0e0ca67159cd535d445_32327f23cd3a7e9293b6f6e514156c67_0_0 and vertex id 32327f23cd3a7e9293b6f6e514156c67_0 to container_e02_1716186778189_0029_01_000002 @ xh3 (dataPort=45797) with allocation id 5de59394618459594cea507b0b706534 2024-06-03 19:35:24[flink-akka.actor.default-dispatcher-22][org.apache.flink.runtime.executiongraph.ExecutionGraph][INFO] - Sink: Kafka输出_2 (1/1) (67303c1e5204a0e0ca67159cd535d445_2534dea4e3192b38956d113863687316_0_0) switched from SCHEDULED to DEPLOYING. 2024-06-03 19:35:24[flink-akka.actor.default-dispatcher-22][org.apache.flink.runtime.executiongraph.ExecutionGraph][INFO] - Deploying Sink: Kafka输出_2 (1/1) (attempt #0) with attempt id 67303c1e5204a0e0ca67159cd535d445_2534dea4e3192b38956d113863687316_0_0 and vertex id 2534dea4e3192b38956d113863687316_0 to container_e02_1716186778189_0029_01_000002 @ xh3 (dataPort=45797) with allocation id 5de59394618459594cea507b0b706534 2024-06-03 19:35:25[flink-akka.actor.default-dispatcher-22][org.apache.flink.runtime.executiongraph.ExecutionGraph][INFO] - Sink: Kafka输出_2 (1/1) (67303c1e5204a0e0ca67159cd535d445_2534dea4e3192b38956d113863687316_0_0) switched from DEPLOYING to INITIALIZING. 2024-06-03 19:35:25[flink-akka.actor.default-dispatcher-22][org.apache.flink.runtime.executiongraph.ExecutionGraph][INFO] - Source: Kafka采集_1 (1/1) (67303c1e5204a0e0ca67159cd535d445_32327f23cd3a7e9293b6f6e514156c67_0_0) switched from DEPLOYING to INITIALIZING. 2024-06-03 19:35:46[flink-akka.actor.default-dispatcher-21][org.apache.flink.runtime.executiongraph.ExecutionGraph][INFO] - Sink: Kafka输出_2 (1/1) (67303c1e5204a0e0ca67159cd535d445_2534dea4e3192b38956d113863687316_0_0) switched from INITIALIZING to RUNNING. 2024-06-03 19:35:47[flink-akka.actor.default-dispatcher-21][org.apache.flink.runtime.executiongraph.ExecutionGraph][INFO] - Source: Kafka采集_1 (1/1) (67303c1e5204a0e0ca67159cd535d445_32327f23cd3a7e9293b6f6e514156c67_0_0) switched from INITIALIZING to RUNNING. 2024-06-03 19:36:07[Checkpoint Timer][org.apache.flink.runtime.checkpoint.CheckpointCoordinator][INFO] - Triggering checkpoint 1 (type=CheckpointType{name='Checkpoint', sharingFilesStrategy=FORWARD_BACKWARD}) @ 1717414567372 for job 50a99ba35e78fc2dada1595a64c1095d.
-
tm log
2024-06-03 19:35:24[Sink: Kafka输出_2 (1/1)#0][org.apache.flink.runtime.taskmanager.Task][INFO] - Loading JAR files for task Sink: Kafka输出_2 (1/1)#0 (67303c1e5204a0e0ca67159cd535d445_2534dea4e3192b38956d113863687316_0_0) [DEPLOYING]. 2024-06-03 19:35:24[Source: Kafka采集_1 (1/1)#0][org.apache.flink.streaming.runtime.tasks.StreamTask][INFO] - Using job/cluster config to configure application-defined state backend: org.apache.flink.runtime.state.hashmap.HashMapStateBackend@164a5a6a 2024-06-03 19:35:24[Sink: Kafka输出_2 (1/1)#0][org.apache.flink.streaming.runtime.tasks.StreamTask][INFO] - Using job/cluster config to configure application-defined state backend: org.apache.flink.runtime.state.hashmap.HashMapStateBackend@4baf67bb 2024-06-03 19:35:24[Source: Kafka采集_1 (1/1)#0][org.apache.flink.streaming.runtime.tasks.StreamTask][INFO] - Using application-defined state backend: org.apache.flink.runtime.state.hashmap.HashMapStateBackend@32c0868a 2024-06-03 19:35:24[Source: Kafka采集_1 (1/1)#0][org.apache.flink.runtime.state.StateBackendLoader][INFO] - State backend loader loads the state backend as HashMapStateBackend 2024-06-03 19:35:24[Sink: Kafka输出_2 (1/1)#0][org.apache.flink.streaming.runtime.tasks.StreamTask][INFO] - Using application-defined state backend: org.apache.flink.runtime.state.hashmap.HashMapStateBackend@798815c7 2024-06-03 19:35:24[Sink: Kafka输出_2 (1/1)#0][org.apache.flink.runtime.state.StateBackendLoader][INFO] - State backend loader loads the state backend as HashMapStateBackend 2024-06-03 19:35:24[Source: Kafka采集_1 (1/1)#0][org.apache.flink.streaming.runtime.tasks.StreamTask][INFO] - Using job/cluster config to configure application-defined checkpoint storage: JobManagerCheckpointStorage (checkpoints to JobManager) ( maxStateSize: 5242880) 2024-06-03 19:35:24[Sink: Kafka输出_2 (1/1)#0][org.apache.flink.streaming.runtime.tasks.StreamTask][INFO] - Using job/cluster config to configure application-defined checkpoint storage: JobManagerCheckpointStorage (checkpoints to JobManager) ( maxStateSize: 5242880) 2024-06-03 19:35:25[Source: Kafka采集_1 (1/1)#0][org.apache.hadoop.hdfs.shortcircuit.DomainSocketFactory][WARN] - The short-circuit local reads feature cannot be used because libhadoop cannot be loaded. 2024-06-03 19:35:25[Sink: Kafka输出_2 (1/1)#0][org.apache.flink.runtime.taskmanager.Task][INFO] - Sink: Kafka输出_2 (1/1)#0 (67303c1e5204a0e0ca67159cd535d445_2534dea4e3192b38956d113863687316_0_0) switched from DEPLOYING to INITIALIZING. 2024-06-03 19:35:25[Source: Kafka采集_1 (1/1)#0][org.apache.flink.runtime.taskmanager.Task][INFO] - Source: Kafka采集_1 (1/1)#0 (67303c1e5204a0e0ca67159cd535d445_32327f23cd3a7e9293b6f6e514156c67_0_0) switched from DEPLOYING to INITIALIZING. 2024-06-03 19:35:25[Source: Kafka采集_1 (1/1)#0][org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumerBase][INFO] - Consumer subtask 0 has no restore state. 2024-06-03 19:35:25[Source: Kafka采集_1 (1/1)#0][cn.com.bsfit.pipeace.component.flink.kafka.KafkaSourceComponent][INFO] - consumer krb5Path is /vdir/opt/hadoop/yarn/local/usercache/paUser/appcache/application_1716186778189_0029/container_e02_1716186778189_0029_01_000002/tempFile/8bef0b9c2306b9d7d40fb95e940d0afckrb5.conf 2024-06-03 19:35:46[Sink: Kafka输出_2 (1/1)#0][cn.com.bsfit.org.apache.flink.streaming.connectors.kafka.KafkaSinkComponent][INFO] - producer krb5Path is /vdir/opt/hadoop/yarn/local/usercache/paUser/appcache/application_1716186778189_0029/container_e02_1716186778189_0029_01_000002/tempFile/8bef0b9c2306b9d7d40fb95e940d0afckrb5.conf 2024-06-03 19:35:46[Sink: Kafka输出_2 (1/1)#0][cn.com.bsfit.org.apache.flink.streaming.connectors.kafka.KafkaSinkComponent][INFO] - producer jaasStr is com.sun.security.auth.module.Krb5LoginModule required useKeyTab=true keyTab="/vdir/opt/hadoop/yarn/local/usercache/paUser/appcache/application_1716186778189_0029/container_e02_1716186778189_0029_01_000002/tempFile/8bef0b9c2306b9d7d40fb95e940d0afckadm5.keytab" principal="paUser@TDH" useTicketCache=false storeKey=true refreshKrb5Config=true debug=true; 2024-06-03 19:35:46[Sink: Kafka输出_2 (1/1)#0][org.apache.flink.streaming.api.functions.sink.TwoPhaseCommitSinkFunction][INFO] - KafkaSinkComponent 1/1 - no state to restore 2024-06-03 19:35:46[Sink: Kafka输出_2 (1/1)#0][cn.com.bsfit.org.apache.flink.streaming.connectors.kafka.PaFlinkKafkaProducer][INFO] - Starting FlinkKafkaInternalProducer (1/1) to produce into default topic pa-first-topic 2024-06-03 19:35:46[Sink: Kafka输出_2 (1/1)#0][org.apache.flink.runtime.taskmanager.Task][INFO] - Sink: Kafka输出_2 (1/1)#0 (67303c1e5204a0e0ca67159cd535d445_2534dea4e3192b38956d113863687316_0_0) switched from INITIALIZING to RUNNING. 2024-06-03 19:35:47[Source: Kafka采集_1 (1/1)#0][org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumerBase][INFO] - Consumer subtask 0 will start reading the following 3 partitions from the committed group offsets in Kafka: [KafkaTopicPartition{topic='mengh-test', partition=2}, KafkaTopicPartition{topic='mengh-test', partition=0}, KafkaTopicPartition{topic='mengh-test', partition=1}] 2024-06-03 19:35:47[Source: Kafka采集_1 (1/1)#0][org.apache.flink.runtime.taskmanager.Task][INFO] - Source: Kafka采集_1 (1/1)#0 (67303c1e5204a0e0ca67159cd535d445_32327f23cd3a7e9293b6f6e514156c67_0_0) switched from INITIALIZING to RUNNING. 2024-06-03 19:35:47[Legacy Source Thread - Source: Kafka采集_1 (1/1)#0][org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumerBase][INFO] - Consumer subtask 0 creating fetcher with offsets {KafkaTopicPartition{topic='mengh-test', partition=2}=-915623761773, KafkaTopicPartition{topic='mengh-test', partition=0}=-915623761773, KafkaTopicPartition{topic='mengh-test', partition=1}=-915623761773}.
i carefully read the job log, but don’t find any useful message, logs show everything is ok. except a refused connectio warn log(you can get detail in my jm log file)
please tell me the reason for why my job checkpoint always in-progress state, and how to make the job normal checkpoint.