Kubernetes pods cannot reach each other after Google Cloud service mesh installation

I’m trying out the Cloud service mesh that was announced in Google Next 2024.

I’m trying it out on a standard cluster, with 4 nodes, following this guide:
https://cloud.google.com/service-mesh/docs/onboarding/provision-control-plane

During the first few days, everything seems fine. But then the pods in my namespace suddenly cannot reach each other anymore.

The pods are reaching each other via Kubernetes services of NodePort type: Pod A -> Kubernetes service B -> Deployment B -> Pod B.

At first I thought there were some issues during the installation process, so I followed https://cloud.google.com/service-mesh/docs/uninstall to uninstall and reinstall the service mesh. But after a few days, the above issue came back.

I have verified the status of the service mesh as per the guide that it is active and running

gcloud container fleet mesh describe
createTime: '2024-07-16T01:31:43.731699992Z'
membershipSpecs:
  projects/123456789123/locations/asia-southeast1/memberships/my-cluster:
    mesh:
      management: MANAGEMENT_AUTOMATIC
membershipStates:
  projects/123456789123/locations/asia-southeast1/memberships/my-cluster:
    servicemesh:
      conditions:
      - code: VPCSC_GA_SUPPORTED
        details: This control plane supports VPC-SC GA.
        documentationLink: http://cloud.google.com/service-mesh/docs/managed/vpc-sc
        severity: INFO
      controlPlaneManagement:
        details:
        - code: REVISION_READY
          details: 'Ready: asm-managed'
        implementation: TRAFFIC_DIRECTOR
        state: ACTIVE
      dataPlaneManagement:
        details:
        - code: OK
          details: Service is running.
        state: ACTIVE
    state:
      code: OK
      description: |-
        Revision ready for use: asm-managed.
        All Canonical Services have been reconciled successfully.
      updateTime: '2024-08-15T07:19:40.846768015Z'
name: projects/my-project/locations/global/features/servicemesh
resourceState:
  state: ACTIVE
spec: {}
updateTime: '2024-08-15T05:34:35.455135300Z'
kubectl describe controlplanerevision -n istio-system
Name:         asm-managed
Namespace:    istio-system
Labels:       app.kubernetes.io/created-by=mesh.googleapis.com
              istio.io/owned-by=mesh.googleapis.com
              mesh.cloud.google.com/managed-cni-enabled=true
Annotations:  mesh.cloud.google.com/proxy: {"managed":"true"}
              mesh.cloud.google.com/vpcsc-ga: false
API Version:  mesh.cloud.google.com/v1beta1
Kind:         ControlPlaneRevision
Metadata:
  Creation Timestamp:  2024-08-13T10:32:05Z
  Generation:          1
  Resource Version:    1101894861
  UID:                 cd4b26f5-d7d1-4a45-b220-2adcf618fa23
Spec:
  Channel:  regular
  Type:     managed_service
Status:
  Conditions:
    Last Transition Time:  2024-08-15T05:55:32Z
    Message:               The provisioning process has completed successfully
    Reason:                Provisioned
    Status:                True
    Type:                  Reconciled
    Last Transition Time:  2024-08-15T05:55:32Z
    Message:               Provisioning has finished
    Reason:                ProvisioningFinished
    Status:                True
    Type:                  ProvisioningFinished
    Last Transition Time:  2024-08-15T05:55:32Z
    Message:               Provisioning has not stalled
    Reason:                NotStalled
    Status:                False
    Type:                  Stalled
Events:                    <none>

I have verified that my namespace is labeled with istio-injected=enabled and can see the istio-proxy sidecars injected into every pod. The istio-proxy sidecar seems to be starting properly:

INFO 2024-08-15T05:42:54.174322288Z [resource.labels.containerName: istio-validation] 2024-08-15T05:42:54.173151Z info Starting iptables validation. This check verifies that iptables rules are properly established for the network.
INFO 2024-08-15T05:42:54.174372588Z [resource.labels.containerName: istio-validation] 2024-08-15T05:42:54.173236Z info Listening on 127.0.0.1:15001
INFO 2024-08-15T05:42:54.174376768Z [resource.labels.containerName: istio-validation] 2024-08-15T05:42:54.173423Z info Listening on 127.0.0.1:15006
INFO 2024-08-15T05:42:54.174380538Z [resource.labels.containerName: istio-validation] 2024-08-15T05:42:54.173810Z info Local addr 127.0.0.1:15006
INFO 2024-08-15T05:42:54.174403398Z [resource.labels.containerName: istio-validation] 2024-08-15T05:42:54.173823Z info Original addr 127.0.0.1: 15002
INFO 2024-08-15T05:42:54.174411368Z [resource.labels.containerName: istio-validation] 2024-08-15T05:42:54.173920Z info Validation passed, iptables rules established
INFO 2024-08-15T05:43:03.301899214Z [resource.labels.containerName: istio-proxy] 2024-08-15T05:43:03.301721Z info FLAG: --concurrency="0"
INFO 2024-08-15T05:43:03.301965054Z [resource.labels.containerName: istio-proxy] 2024-08-15T05:43:03.301757Z info FLAG: --domain="my-namespace.svc.cluster.local"
INFO 2024-08-15T05:43:03.301971444Z [resource.labels.containerName: istio-proxy] 2024-08-15T05:43:03.301764Z info FLAG: --help="false"
INFO 2024-08-15T05:43:03.301976384Z [resource.labels.containerName: istio-proxy] 2024-08-15T05:43:03.301768Z info FLAG: --log_as_json="false"
INFO 2024-08-15T05:43:03.301981364Z [resource.labels.containerName: istio-proxy] 2024-08-15T05:43:03.301771Z info FLAG: --log_caller=""
INFO 2024-08-15T05:43:03.301986134Z [resource.labels.containerName: istio-proxy] 2024-08-15T05:43:03.301774Z info FLAG: --log_output_level="default:info"
INFO 2024-08-15T05:43:03.301990984Z [resource.labels.containerName: istio-proxy] 2024-08-15T05:43:03.301777Z info FLAG: --log_rotate=""
INFO 2024-08-15T05:43:03.301995774Z [resource.labels.containerName: istio-proxy] 2024-08-15T05:43:03.301781Z info FLAG: --log_rotate_max_age="30"
INFO 2024-08-15T05:43:03.302000094Z [resource.labels.containerName: istio-proxy] 2024-08-15T05:43:03.301784Z info FLAG: --log_rotate_max_backups="1000"
INFO 2024-08-15T05:43:03.302004424Z [resource.labels.containerName: istio-proxy] 2024-08-15T05:43:03.301788Z info FLAG: --log_rotate_max_size="104857600"
INFO 2024-08-15T05:43:03.302009284Z [resource.labels.containerName: istio-proxy] 2024-08-15T05:43:03.301791Z info FLAG: --log_stacktrace_level="default:none"
INFO 2024-08-15T05:43:03.302014394Z [resource.labels.containerName: istio-proxy] 2024-08-15T05:43:03.301803Z info FLAG: --log_target="[stdout]"
INFO 2024-08-15T05:43:03.302018274Z [resource.labels.containerName: istio-proxy] 2024-08-15T05:43:03.301808Z info FLAG: --meshConfig="./etc/istio/config/mesh"
INFO 2024-08-15T05:43:03.302021414Z [resource.labels.containerName: istio-proxy] 2024-08-15T05:43:03.301811Z info FLAG: --outlierLogPath=""
INFO 2024-08-15T05:43:03.302024434Z [resource.labels.containerName: istio-proxy] 2024-08-15T05:43:03.301815Z info FLAG: --profiling="true"
INFO 2024-08-15T05:43:03.302027454Z [resource.labels.containerName: istio-proxy] 2024-08-15T05:43:03.301818Z info FLAG: --proxyComponentLogLevel="misc:error"
INFO 2024-08-15T05:43:03.302030504Z [resource.labels.containerName: istio-proxy] 2024-08-15T05:43:03.301822Z info FLAG: --proxyLogLevel="warning"
INFO 2024-08-15T05:43:03.302033784Z [resource.labels.containerName: istio-proxy] 2024-08-15T05:43:03.301826Z info FLAG: --serviceCluster="istio-proxy"
INFO 2024-08-15T05:43:03.302036864Z [resource.labels.containerName: istio-proxy] 2024-08-15T05:43:03.301838Z info FLAG: --stsPort="15463"
INFO 2024-08-15T05:43:03.302070544Z [resource.labels.containerName: istio-proxy] 2024-08-15T05:43:03.301905Z info FLAG: --templateFile=""
INFO 2024-08-15T05:43:03.302077534Z [resource.labels.containerName: istio-proxy] 2024-08-15T05:43:03.301916Z info FLAG: --tokenManagerPlugin="GoogleTokenExchange"
INFO 2024-08-15T05:43:03.302082334Z [resource.labels.containerName: istio-proxy] 2024-08-15T05:43:03.301928Z info FLAG: --vklog="0"
INFO 2024-08-15T05:43:03.302087684Z [resource.labels.containerName: istio-proxy] 2024-08-15T05:43:03.301940Z info Version 1.19.10-asm.6-491aae094c181ecc5467c78ddd3591b27a5c84cc-Clean
INFO 2024-08-15T05:43:03.302210354Z [resource.labels.containerName: istio-proxy] 2024-08-15T05:43:03.302115Z warn failed running ulimit command:
INFO 2024-08-15T05:43:03.302843944Z [resource.labels.containerName: istio-proxy] 2024-08-15T05:43:03.302379Z info Proxy role ips=[10.84.6.35] type=sidecar id=my-pod-c6c9d855d-fj9pz.my-namespace domain=my-namespace.svc.cluster.local
INFO 2024-08-15T05:43:03.302859354Z [resource.labels.containerName: istio-proxy] 2024-08-15T05:43:03.302485Z info Apply proxy config from env {"discoveryAddress":"meshconfig.googleapis.com:443","proxyMetadata":{"CA_PROVIDER":"GoogleCA","CA_ROOT_CA":"/etc/ssl/certs/ca-certificates.crt","CA_TRUSTANCHOR":"","FLEET_PROJECT_NUMBER":"123456789123","GCP_METADATA":"my-project|123456789123|my-cluster|asia-southeast1-a","OUTPUT_CERTS":"/etc/istio/proxy","PROXY_CONFIG_XDS_AGENT":"true","XDS_AUTH_PROVIDER":"gcp","XDS_ROOT_CA":"/etc/ssl/certs/ca-certificates.crt"},"meshId":"proj-123456789123"}
INFO 2024-08-15T05:43:03.302865444Z [resource.labels.containerName: istio-proxy] {}
INFO 2024-08-15T05:43:03.305151943Z [resource.labels.containerName: istio-proxy] 2024-08-15T05:43:03.305025Z info cpu limit detected as 2, setting concurrency
INFO 2024-08-15T05:43:03.305726793Z [resource.labels.containerName: istio-proxy] 2024-08-15T05:43:03.305632Z info Effective config: binaryPath: /usr/local/bin/envoy
INFO 2024-08-15T05:43:03.305742173Z [resource.labels.containerName: istio-proxy] concurrency: 2
INFO 2024-08-15T05:43:03.305748413Z [resource.labels.containerName: istio-proxy] configPath: ./etc/istio/proxy
INFO 2024-08-15T05:43:03.305754654Z [resource.labels.containerName: istio-proxy] controlPlaneAuthPolicy: MUTUAL_TLS
INFO 2024-08-15T05:43:03.305759594Z [resource.labels.containerName: istio-proxy] discoveryAddress: meshconfig.googleapis.com:443
INFO 2024-08-15T05:43:03.305765343Z [resource.labels.containerName: istio-proxy] drainDuration: 45s
INFO 2024-08-15T05:43:03.305769883Z [resource.labels.containerName: istio-proxy] meshId: proj-123456789123
INFO 2024-08-15T05:43:03.305774314Z [resource.labels.containerName: istio-proxy] proxyAdminPort: 15000
INFO 2024-08-15T05:43:03.305779434Z [resource.labels.containerName: istio-proxy] proxyMetadata:
INFO 2024-08-15T05:43:03.305824074Z [resource.labels.containerName: istio-proxy] CA_PROVIDER: GoogleCA
INFO 2024-08-15T05:43:03.305829003Z [resource.labels.containerName: istio-proxy] CA_ROOT_CA: /etc/ssl/certs/ca-certificates.crt
INFO 2024-08-15T05:43:03.305833774Z [resource.labels.containerName: istio-proxy] CA_TRUSTANCHOR: ""
INFO 2024-08-15T05:43:03.305838354Z [resource.labels.containerName: istio-proxy] FLEET_PROJECT_NUMBER: "123456789123"
INFO 2024-08-15T05:43:03.305844074Z [resource.labels.containerName: istio-proxy] GCP_METADATA: my-project|123456789123|my-cluster|asia-southeast1-a
INFO 2024-08-15T05:43:03.305848414Z [resource.labels.containerName: istio-proxy] OUTPUT_CERTS: /etc/istio/proxy
INFO 2024-08-15T05:43:03.305852854Z [resource.labels.containerName: istio-proxy] PROXY_CONFIG_XDS_AGENT: "true"
INFO 2024-08-15T05:43:03.305857503Z [resource.labels.containerName: istio-proxy] XDS_AUTH_PROVIDER: gcp
INFO 2024-08-15T05:43:03.305862883Z [resource.labels.containerName: istio-proxy] XDS_ROOT_CA: /etc/ssl/certs/ca-certificates.crt
INFO 2024-08-15T05:43:03.305867523Z [resource.labels.containerName: istio-proxy] serviceCluster: istio-proxy
INFO 2024-08-15T05:43:03.305872163Z [resource.labels.containerName: istio-proxy] statNameLength: 189
INFO 2024-08-15T05:43:03.305876963Z [resource.labels.containerName: istio-proxy] statusPort: 15020
INFO 2024-08-15T05:43:03.305881603Z [resource.labels.containerName: istio-proxy] terminationDrainDuration: 5s
INFO 2024-08-15T05:43:03.305886443Z [resource.labels.containerName: istio-proxy] tracing:
INFO 2024-08-15T05:43:03.305890974Z [resource.labels.containerName: istio-proxy] zipkin:
INFO 2024-08-15T05:43:03.305895783Z [resource.labels.containerName: istio-proxy] address: zipkin.istio-system:9411
INFO 2024-08-15T05:43:03.305900343Z [resource.labels.containerName: istio-proxy] {}
INFO 2024-08-15T05:43:03.305904854Z [resource.labels.containerName: istio-proxy] 2024-08-15T05:43:03.305656Z info JWT policy is third-party-jwt
INFO 2024-08-15T05:43:03.305910383Z [resource.labels.containerName: istio-proxy] 2024-08-15T05:43:03.305662Z info using credential fetcher of JWT type in my-project.svc.id.goog trust domain
INFO 2024-08-15T05:43:03.305915654Z [resource.labels.containerName: istio-proxy] 2024-08-15T05:43:03.305674Z info stsclient GKE_CLUSTER_URL is not set, fetched cluster URL from metadata server: "https://container.googleapis.com/v1/projects/my-project/locations/asia-southeast1-a/clusters/my-cluster"
INFO 2024-08-15T05:43:03.317072502Z [resource.labels.containerName: istio-proxy] 2024-08-15T05:43:03.316881Z info stsserver Start listening on 127.0.0.1:15463
INFO 2024-08-15T05:43:03.317265023Z [resource.labels.containerName: istio-proxy] 2024-08-15T05:43:03.317124Z info platform detected is GCP
INFO 2024-08-15T05:43:03.318458742Z [resource.labels.containerName: istio-proxy] 2024-08-15T05:43:03.318285Z info Workload SDS socket not found. Starting Istio SDS Server
INFO 2024-08-15T05:43:03.318508033Z [resource.labels.containerName: istio-proxy] 2024-08-15T05:43:03.318321Z info CA Endpoint meshca.googleapis.com:443, provider GoogleCA
INFO 2024-08-15T05:43:03.318540842Z [resource.labels.containerName: istio-proxy] 2024-08-15T05:43:03.318310Z info Opening status port 15020
INFO 2024-08-15T05:43:03.319659752Z [resource.labels.containerName: istio-proxy] 2024-08-15T05:43:03.319553Z info ads All caches have been synced up in 18.479078ms, marking server ready
INFO 2024-08-15T05:43:03.320576522Z [resource.labels.containerName: istio-proxy] 2024-08-15T05:43:03.320469Z info xdsproxy Initializing with upstream address "meshconfig.googleapis.com:443" and cluster "cn-my-project-asia-southeast1-a-my-cluster"
INFO 2024-08-15T05:43:03.326444482Z [resource.labels.containerName: istio-proxy] 2024-08-15T05:43:03.326206Z info Pilot SAN: [meshconfig.googleapis.com]
INFO 2024-08-15T05:43:03.327752852Z [resource.labels.containerName: istio-proxy] 2024-08-15T05:43:03.327567Z info LRS for MCP is enabled
INFO 2024-08-15T05:43:03.328462842Z [resource.labels.containerName: istio-proxy] 2024-08-15T05:43:03.328306Z info Starting proxy agent
INFO 2024-08-15T05:43:03.328481212Z [resource.labels.containerName: istio-proxy] 2024-08-15T05:43:03.328331Z info starting
INFO 2024-08-15T05:43:03.328488422Z [resource.labels.containerName: istio-proxy] 2024-08-15T05:43:03.328373Z info Envoy command: [-c etc/istio/proxy/envoy-rev.json --drain-time-s 45 --drain-strategy immediate --local-address-ip-version v4 --file-flush-interval-msec 1000 --disable-hot-restart --allow-unknown-static-fields --log-format %Y-%m-%dT%T.%fZ %l envoy %n %g:%# %v thread=%t -l warning --component-log-level misc:error --concurrency 2]
INFO 2024-08-15T05:43:03.336131571Z [resource.labels.containerName: istio-proxy] 2024-08-15T05:43:03.335792Z info sds Starting SDS grpc server
INFO 2024-08-15T05:43:03.336158351Z [resource.labels.containerName: istio-proxy] 2024-08-15T05:43:03.335939Z info starting Http service at 127.0.0.1:15004
INFO 2024-08-15T05:43:03.422323294Z [resource.labels.containerName: istio-proxy] 2024-08-15T05:43:03.422160Z info token Prepared federated token request for aud "identitynamespace:my-project.svc.id.goog:https://container.googleapis.com/v1/projects/my-project/locations/asia-southeast1-a/clusters/my-cluster"
INFO 2024-08-15T05:43:03.435771723Z [resource.labels.containerName: istio-proxy] 2024-08-15T05:43:03.435623Z info token Prepared federated token request for aud "identitynamespace:my-project.svc.id.goog:https://container.googleapis.com/v1/projects/my-project/locations/asia-southeast1-a/clusters/my-cluster"
INFO 2024-08-15T05:43:03.478943780Z [resource.labels.containerName: istio-proxy] 2024-08-15T05:43:03.478781Z info token fetched federated token latency=56.396066ms ttl=3599
INFO 2024-08-15T05:43:03.481962759Z [resource.labels.containerName: istio-proxy] 2024-08-15T05:43:03.481629Z info googleca Cert created with GoogleCA asia-southeast1-a chain length 3
INFO 2024-08-15T05:43:03.481995519Z [resource.labels.containerName: istio-proxy] 2024-08-15T05:43:03.481744Z info cache generated new workload certificate latency=161.087877ms ttl=23h59m59.518258291s
INFO 2024-08-15T05:43:03.482001569Z [resource.labels.containerName: istio-proxy] 2024-08-15T05:43:03.481775Z info cache Root cert has changed, start rotating root cert
INFO 2024-08-15T05:43:03.482006279Z [resource.labels.containerName: istio-proxy] 2024-08-15T05:43:03.481793Z info ads XDS: Incremental Pushing ConnectedEndpoints:0 Version:
INFO 2024-08-15T05:43:03.482240759Z [resource.labels.containerName: istio-proxy] 2024-08-15T05:43:03.482118Z info cache returned workload trust anchor from cache ttl=23h59m59.517883411s
INFO 2024-08-15T05:43:03.482810239Z [resource.labels.containerName: istio-proxy] 2024-08-15T05:43:03.482563Z info token fetched federated token latency=46.758686ms ttl=3599
INFO 2024-08-15T05:43:03.531310485Z [resource.labels.containerName: istio-proxy] 2024-08-15T05:43:03.531089Z info token fetched access token latency=48.340526ms ttl=59m59.468912605s
INFO 2024-08-15T05:43:03.537333795Z [resource.labels.containerName: istio-proxy] 2024-08-15T05:43:03.537095Z info token fetched access token latency=58.073286ms ttl=59m59.462908625s
INFO 2024-08-15T05:43:03.537381655Z [resource.labels.containerName: istio-proxy] 2024-08-15T05:43:03.537212Z info xdsproxy connected to upstream XDS server: meshconfig.googleapis.com:443

I tried looking into the kube-system namespace of my cluster, where some new stuff were added for the service mesh: a DaemonSet “istio-cni-node”, a DaemonSet “snk”, a Deployment “mdp-controller”.

I checked the logs or istio-cni-node and mdp-controller and they seem to be fine.

But there were issues with the snk DaemonSet, 2 of its pods are restarting due to OOMKilled.

Name             Status         Restarts           Created on
snk-cqfg4    Running    95             Aug 13, 2024, 11:23:07 PM    
snk-455cv    Running    13             Aug 15, 2024, 7:38:19 AM 
snk-7fg4n    Running    0              Aug 15, 2024, 7:54:17 AM 
snk-95hc4    Running    0              Aug 15, 2024, 11:13:48 AM    
WARNING 2024-08-15T07:37:58Z [resource.labels.nodeName: gke-my-cluster-default-pool-6bd5b0f1-shpw] Memory cgroup out of memory: Killed process 709708 (snk) total-vm:2167296kB, anon-rss:29588kB, file-rss:36552kB, shmem-rss:0kB, UID:2692 pgtables:360kB oom_score_adj:999
DEFAULT 2024-08-15T07:37:58.013949Z [resource.labels.nodeName: gke-my-cluster-default-pool-6bd5b0f1-shpw] I0815 07:37:58.013791 2649 log_monitor.go:159] New status generated: &{Source:kernel-monitor Events:[{Severity:warn Timestamp:2024-08-15 07:37:57.564042014 +0000 UTC m=+25195.175669820 Reason:OOMKilling Message:Memory cgroup out of memory: Killed process 709708 (snk) total-vm:2167296kB, anon-rss:29588kB, file-rss:36552kB, shmem-rss:0kB, UID:2692 pgtables:360kB oom_score_adj:999}] Conditions:[{Type:KernelDeadlock Status:False Transition:2024-08-15 00:38:03.556889826 +0000 UTC m=+1.168517601 Reason:KernelHasNoDeadlock Message:kernel has no deadlock} {Type:ReadonlyFilesystem Status:False Transition:2024-08-15 00:38:03.556889935 +0000 UTC m=+1.168517721 Reason:FilesystemIsNotReadOnly Message:Filesystem is not read-only}]}

snk pod memory usage chart

As I understand, for a DaemonSet, each of its pod is on each node of the cluster. So I guess the OOM issue has something to do with the node.

From the logs of the snk pod, it seems to be gathering the IP addresses of the pods in the node.

And the 2 nodes hosting the 2 snk pods with a lot restarts due to OOMKilled have more pods than the other 2 nodes, mostly pods started from Kubernetes cronjobs in the cluster, which have istio sidecar injection disabled since istio sidecar prevents the pod of the cronjob from Completing and shutting down.

If I stop scheduling more pods to those 2 nodes, the OOMKilled issue seems to stop.

I tried to increase the memory limit of the snk DaemonSet to 100MiB but it get reverted back to 30MiB a few minutes later, I assumed by the managed service mesh.

I am not sure what to do from here as I just cannot stop pods from being scheduled into the 2 nodes as they run my business logic, or is the OOMKilled issue with the snk pods really the main cause of the problems of my pods cannot reach each other.

New contributor

tnd501 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.

Trang chủ Giới thiệu Sinh nhật bé trai Sinh nhật bé gái Tổ chức sự kiện Biểu diễn giải trí Dịch vụ khác Trang trí tiệc cưới Tổ chức khai trương Tư vấn dịch vụ Thư viện ảnh Tin tức - sự kiện Liên hệ Chú hề sinh nhật Trang trí YEAR END PARTY công ty Trang trí tất niên cuối năm Trang trí tất niên xu hướng mới nhất Trang trí sinh nhật bé trai Hải Đăng Trang trí sinh nhật bé Khánh Vân Trang trí sinh nhật Bích Ngân Trang trí sinh nhật bé Thanh Trang Thuê ông già Noel phát quà Biểu diễn xiếc khỉ Xiếc quay đĩa Dịch vụ tổ chức sự kiện 5 sao Thông tin về chúng tôi Dịch vụ sinh nhật bé trai Dịch vụ sinh nhật bé gái Sự kiện trọn gói Các tiết mục giải trí Dịch vụ bổ trợ Tiệc cưới sang trọng Dịch vụ khai trương Tư vấn tổ chức sự kiện Hình ảnh sự kiện Cập nhật tin tức Liên hệ ngay Thuê chú hề chuyên nghiệp Tiệc tất niên cho công ty Trang trí tiệc cuối năm Tiệc tất niên độc đáo Sinh nhật bé Hải Đăng Sinh nhật đáng yêu bé Khánh Vân Sinh nhật sang trọng Bích Ngân Tiệc sinh nhật bé Thanh Trang Dịch vụ ông già Noel Xiếc thú vui nhộn Biểu diễn xiếc quay đĩa Dịch vụ tổ chức tiệc uy tín Khám phá dịch vụ của chúng tôi Tiệc sinh nhật cho bé trai Trang trí tiệc cho bé gái Gói sự kiện chuyên nghiệp Chương trình giải trí hấp dẫn Dịch vụ hỗ trợ sự kiện Trang trí tiệc cưới đẹp Khởi đầu thành công với khai trương Chuyên gia tư vấn sự kiện Xem ảnh các sự kiện đẹp Tin mới về sự kiện Kết nối với đội ngũ chuyên gia Chú hề vui nhộn cho tiệc sinh nhật Ý tưởng tiệc cuối năm Tất niên độc đáo Trang trí tiệc hiện đại Tổ chức sinh nhật cho Hải Đăng Sinh nhật độc quyền Khánh Vân Phong cách tiệc Bích Ngân Trang trí tiệc bé Thanh Trang Thuê dịch vụ ông già Noel chuyên nghiệp Xem xiếc khỉ đặc sắc Xiếc quay đĩa thú vị
Trang chủ Giới thiệu Sinh nhật bé trai Sinh nhật bé gái Tổ chức sự kiện Biểu diễn giải trí Dịch vụ khác Trang trí tiệc cưới Tổ chức khai trương Tư vấn dịch vụ Thư viện ảnh Tin tức - sự kiện Liên hệ Chú hề sinh nhật Trang trí YEAR END PARTY công ty Trang trí tất niên cuối năm Trang trí tất niên xu hướng mới nhất Trang trí sinh nhật bé trai Hải Đăng Trang trí sinh nhật bé Khánh Vân Trang trí sinh nhật Bích Ngân Trang trí sinh nhật bé Thanh Trang Thuê ông già Noel phát quà Biểu diễn xiếc khỉ Xiếc quay đĩa

Kubernetes pods cannot reach each other after Google Cloud service mesh installation

I’m trying out the Cloud service mesh that was announced in Google Next 2024.

I’m trying it out on a standard cluster, with 4 nodes, following this guide:
https://cloud.google.com/service-mesh/docs/onboarding/provision-control-plane

During the first few days, everything seems fine. But then the pods in my namespace suddenly cannot reach each other anymore.

The pods are reaching each other via Kubernetes services of NodePort type: Pod A -> Kubernetes service B -> Deployment B -> Pod B.

At first I thought there were some issues during the installation process, so I followed https://cloud.google.com/service-mesh/docs/uninstall to uninstall and reinstall the service mesh. But after a few days, the above issue came back.

I have verified the status of the service mesh as per the guide that it is active and running

gcloud container fleet mesh describe
createTime: '2024-07-16T01:31:43.731699992Z'
membershipSpecs:
  projects/123456789123/locations/asia-southeast1/memberships/my-cluster:
    mesh:
      management: MANAGEMENT_AUTOMATIC
membershipStates:
  projects/123456789123/locations/asia-southeast1/memberships/my-cluster:
    servicemesh:
      conditions:
      - code: VPCSC_GA_SUPPORTED
        details: This control plane supports VPC-SC GA.
        documentationLink: http://cloud.google.com/service-mesh/docs/managed/vpc-sc
        severity: INFO
      controlPlaneManagement:
        details:
        - code: REVISION_READY
          details: 'Ready: asm-managed'
        implementation: TRAFFIC_DIRECTOR
        state: ACTIVE
      dataPlaneManagement:
        details:
        - code: OK
          details: Service is running.
        state: ACTIVE
    state:
      code: OK
      description: |-
        Revision ready for use: asm-managed.
        All Canonical Services have been reconciled successfully.
      updateTime: '2024-08-15T07:19:40.846768015Z'
name: projects/my-project/locations/global/features/servicemesh
resourceState:
  state: ACTIVE
spec: {}
updateTime: '2024-08-15T05:34:35.455135300Z'
kubectl describe controlplanerevision -n istio-system
Name:         asm-managed
Namespace:    istio-system
Labels:       app.kubernetes.io/created-by=mesh.googleapis.com
              istio.io/owned-by=mesh.googleapis.com
              mesh.cloud.google.com/managed-cni-enabled=true
Annotations:  mesh.cloud.google.com/proxy: {"managed":"true"}
              mesh.cloud.google.com/vpcsc-ga: false
API Version:  mesh.cloud.google.com/v1beta1
Kind:         ControlPlaneRevision
Metadata:
  Creation Timestamp:  2024-08-13T10:32:05Z
  Generation:          1
  Resource Version:    1101894861
  UID:                 cd4b26f5-d7d1-4a45-b220-2adcf618fa23
Spec:
  Channel:  regular
  Type:     managed_service
Status:
  Conditions:
    Last Transition Time:  2024-08-15T05:55:32Z
    Message:               The provisioning process has completed successfully
    Reason:                Provisioned
    Status:                True
    Type:                  Reconciled
    Last Transition Time:  2024-08-15T05:55:32Z
    Message:               Provisioning has finished
    Reason:                ProvisioningFinished
    Status:                True
    Type:                  ProvisioningFinished
    Last Transition Time:  2024-08-15T05:55:32Z
    Message:               Provisioning has not stalled
    Reason:                NotStalled
    Status:                False
    Type:                  Stalled
Events:                    <none>

I have verified that my namespace is labeled with istio-injected=enabled and can see the istio-proxy sidecars injected into every pod. The istio-proxy sidecar seems to be starting properly:

INFO 2024-08-15T05:42:54.174322288Z [resource.labels.containerName: istio-validation] 2024-08-15T05:42:54.173151Z info Starting iptables validation. This check verifies that iptables rules are properly established for the network.
INFO 2024-08-15T05:42:54.174372588Z [resource.labels.containerName: istio-validation] 2024-08-15T05:42:54.173236Z info Listening on 127.0.0.1:15001
INFO 2024-08-15T05:42:54.174376768Z [resource.labels.containerName: istio-validation] 2024-08-15T05:42:54.173423Z info Listening on 127.0.0.1:15006
INFO 2024-08-15T05:42:54.174380538Z [resource.labels.containerName: istio-validation] 2024-08-15T05:42:54.173810Z info Local addr 127.0.0.1:15006
INFO 2024-08-15T05:42:54.174403398Z [resource.labels.containerName: istio-validation] 2024-08-15T05:42:54.173823Z info Original addr 127.0.0.1: 15002
INFO 2024-08-15T05:42:54.174411368Z [resource.labels.containerName: istio-validation] 2024-08-15T05:42:54.173920Z info Validation passed, iptables rules established
INFO 2024-08-15T05:43:03.301899214Z [resource.labels.containerName: istio-proxy] 2024-08-15T05:43:03.301721Z info FLAG: --concurrency="0"
INFO 2024-08-15T05:43:03.301965054Z [resource.labels.containerName: istio-proxy] 2024-08-15T05:43:03.301757Z info FLAG: --domain="my-namespace.svc.cluster.local"
INFO 2024-08-15T05:43:03.301971444Z [resource.labels.containerName: istio-proxy] 2024-08-15T05:43:03.301764Z info FLAG: --help="false"
INFO 2024-08-15T05:43:03.301976384Z [resource.labels.containerName: istio-proxy] 2024-08-15T05:43:03.301768Z info FLAG: --log_as_json="false"
INFO 2024-08-15T05:43:03.301981364Z [resource.labels.containerName: istio-proxy] 2024-08-15T05:43:03.301771Z info FLAG: --log_caller=""
INFO 2024-08-15T05:43:03.301986134Z [resource.labels.containerName: istio-proxy] 2024-08-15T05:43:03.301774Z info FLAG: --log_output_level="default:info"
INFO 2024-08-15T05:43:03.301990984Z [resource.labels.containerName: istio-proxy] 2024-08-15T05:43:03.301777Z info FLAG: --log_rotate=""
INFO 2024-08-15T05:43:03.301995774Z [resource.labels.containerName: istio-proxy] 2024-08-15T05:43:03.301781Z info FLAG: --log_rotate_max_age="30"
INFO 2024-08-15T05:43:03.302000094Z [resource.labels.containerName: istio-proxy] 2024-08-15T05:43:03.301784Z info FLAG: --log_rotate_max_backups="1000"
INFO 2024-08-15T05:43:03.302004424Z [resource.labels.containerName: istio-proxy] 2024-08-15T05:43:03.301788Z info FLAG: --log_rotate_max_size="104857600"
INFO 2024-08-15T05:43:03.302009284Z [resource.labels.containerName: istio-proxy] 2024-08-15T05:43:03.301791Z info FLAG: --log_stacktrace_level="default:none"
INFO 2024-08-15T05:43:03.302014394Z [resource.labels.containerName: istio-proxy] 2024-08-15T05:43:03.301803Z info FLAG: --log_target="[stdout]"
INFO 2024-08-15T05:43:03.302018274Z [resource.labels.containerName: istio-proxy] 2024-08-15T05:43:03.301808Z info FLAG: --meshConfig="./etc/istio/config/mesh"
INFO 2024-08-15T05:43:03.302021414Z [resource.labels.containerName: istio-proxy] 2024-08-15T05:43:03.301811Z info FLAG: --outlierLogPath=""
INFO 2024-08-15T05:43:03.302024434Z [resource.labels.containerName: istio-proxy] 2024-08-15T05:43:03.301815Z info FLAG: --profiling="true"
INFO 2024-08-15T05:43:03.302027454Z [resource.labels.containerName: istio-proxy] 2024-08-15T05:43:03.301818Z info FLAG: --proxyComponentLogLevel="misc:error"
INFO 2024-08-15T05:43:03.302030504Z [resource.labels.containerName: istio-proxy] 2024-08-15T05:43:03.301822Z info FLAG: --proxyLogLevel="warning"
INFO 2024-08-15T05:43:03.302033784Z [resource.labels.containerName: istio-proxy] 2024-08-15T05:43:03.301826Z info FLAG: --serviceCluster="istio-proxy"
INFO 2024-08-15T05:43:03.302036864Z [resource.labels.containerName: istio-proxy] 2024-08-15T05:43:03.301838Z info FLAG: --stsPort="15463"
INFO 2024-08-15T05:43:03.302070544Z [resource.labels.containerName: istio-proxy] 2024-08-15T05:43:03.301905Z info FLAG: --templateFile=""
INFO 2024-08-15T05:43:03.302077534Z [resource.labels.containerName: istio-proxy] 2024-08-15T05:43:03.301916Z info FLAG: --tokenManagerPlugin="GoogleTokenExchange"
INFO 2024-08-15T05:43:03.302082334Z [resource.labels.containerName: istio-proxy] 2024-08-15T05:43:03.301928Z info FLAG: --vklog="0"
INFO 2024-08-15T05:43:03.302087684Z [resource.labels.containerName: istio-proxy] 2024-08-15T05:43:03.301940Z info Version 1.19.10-asm.6-491aae094c181ecc5467c78ddd3591b27a5c84cc-Clean
INFO 2024-08-15T05:43:03.302210354Z [resource.labels.containerName: istio-proxy] 2024-08-15T05:43:03.302115Z warn failed running ulimit command:
INFO 2024-08-15T05:43:03.302843944Z [resource.labels.containerName: istio-proxy] 2024-08-15T05:43:03.302379Z info Proxy role ips=[10.84.6.35] type=sidecar id=my-pod-c6c9d855d-fj9pz.my-namespace domain=my-namespace.svc.cluster.local
INFO 2024-08-15T05:43:03.302859354Z [resource.labels.containerName: istio-proxy] 2024-08-15T05:43:03.302485Z info Apply proxy config from env {"discoveryAddress":"meshconfig.googleapis.com:443","proxyMetadata":{"CA_PROVIDER":"GoogleCA","CA_ROOT_CA":"/etc/ssl/certs/ca-certificates.crt","CA_TRUSTANCHOR":"","FLEET_PROJECT_NUMBER":"123456789123","GCP_METADATA":"my-project|123456789123|my-cluster|asia-southeast1-a","OUTPUT_CERTS":"/etc/istio/proxy","PROXY_CONFIG_XDS_AGENT":"true","XDS_AUTH_PROVIDER":"gcp","XDS_ROOT_CA":"/etc/ssl/certs/ca-certificates.crt"},"meshId":"proj-123456789123"}
INFO 2024-08-15T05:43:03.302865444Z [resource.labels.containerName: istio-proxy] {}
INFO 2024-08-15T05:43:03.305151943Z [resource.labels.containerName: istio-proxy] 2024-08-15T05:43:03.305025Z info cpu limit detected as 2, setting concurrency
INFO 2024-08-15T05:43:03.305726793Z [resource.labels.containerName: istio-proxy] 2024-08-15T05:43:03.305632Z info Effective config: binaryPath: /usr/local/bin/envoy
INFO 2024-08-15T05:43:03.305742173Z [resource.labels.containerName: istio-proxy] concurrency: 2
INFO 2024-08-15T05:43:03.305748413Z [resource.labels.containerName: istio-proxy] configPath: ./etc/istio/proxy
INFO 2024-08-15T05:43:03.305754654Z [resource.labels.containerName: istio-proxy] controlPlaneAuthPolicy: MUTUAL_TLS
INFO 2024-08-15T05:43:03.305759594Z [resource.labels.containerName: istio-proxy] discoveryAddress: meshconfig.googleapis.com:443
INFO 2024-08-15T05:43:03.305765343Z [resource.labels.containerName: istio-proxy] drainDuration: 45s
INFO 2024-08-15T05:43:03.305769883Z [resource.labels.containerName: istio-proxy] meshId: proj-123456789123
INFO 2024-08-15T05:43:03.305774314Z [resource.labels.containerName: istio-proxy] proxyAdminPort: 15000
INFO 2024-08-15T05:43:03.305779434Z [resource.labels.containerName: istio-proxy] proxyMetadata:
INFO 2024-08-15T05:43:03.305824074Z [resource.labels.containerName: istio-proxy] CA_PROVIDER: GoogleCA
INFO 2024-08-15T05:43:03.305829003Z [resource.labels.containerName: istio-proxy] CA_ROOT_CA: /etc/ssl/certs/ca-certificates.crt
INFO 2024-08-15T05:43:03.305833774Z [resource.labels.containerName: istio-proxy] CA_TRUSTANCHOR: ""
INFO 2024-08-15T05:43:03.305838354Z [resource.labels.containerName: istio-proxy] FLEET_PROJECT_NUMBER: "123456789123"
INFO 2024-08-15T05:43:03.305844074Z [resource.labels.containerName: istio-proxy] GCP_METADATA: my-project|123456789123|my-cluster|asia-southeast1-a
INFO 2024-08-15T05:43:03.305848414Z [resource.labels.containerName: istio-proxy] OUTPUT_CERTS: /etc/istio/proxy
INFO 2024-08-15T05:43:03.305852854Z [resource.labels.containerName: istio-proxy] PROXY_CONFIG_XDS_AGENT: "true"
INFO 2024-08-15T05:43:03.305857503Z [resource.labels.containerName: istio-proxy] XDS_AUTH_PROVIDER: gcp
INFO 2024-08-15T05:43:03.305862883Z [resource.labels.containerName: istio-proxy] XDS_ROOT_CA: /etc/ssl/certs/ca-certificates.crt
INFO 2024-08-15T05:43:03.305867523Z [resource.labels.containerName: istio-proxy] serviceCluster: istio-proxy
INFO 2024-08-15T05:43:03.305872163Z [resource.labels.containerName: istio-proxy] statNameLength: 189
INFO 2024-08-15T05:43:03.305876963Z [resource.labels.containerName: istio-proxy] statusPort: 15020
INFO 2024-08-15T05:43:03.305881603Z [resource.labels.containerName: istio-proxy] terminationDrainDuration: 5s
INFO 2024-08-15T05:43:03.305886443Z [resource.labels.containerName: istio-proxy] tracing:
INFO 2024-08-15T05:43:03.305890974Z [resource.labels.containerName: istio-proxy] zipkin:
INFO 2024-08-15T05:43:03.305895783Z [resource.labels.containerName: istio-proxy] address: zipkin.istio-system:9411
INFO 2024-08-15T05:43:03.305900343Z [resource.labels.containerName: istio-proxy] {}
INFO 2024-08-15T05:43:03.305904854Z [resource.labels.containerName: istio-proxy] 2024-08-15T05:43:03.305656Z info JWT policy is third-party-jwt
INFO 2024-08-15T05:43:03.305910383Z [resource.labels.containerName: istio-proxy] 2024-08-15T05:43:03.305662Z info using credential fetcher of JWT type in my-project.svc.id.goog trust domain
INFO 2024-08-15T05:43:03.305915654Z [resource.labels.containerName: istio-proxy] 2024-08-15T05:43:03.305674Z info stsclient GKE_CLUSTER_URL is not set, fetched cluster URL from metadata server: "https://container.googleapis.com/v1/projects/my-project/locations/asia-southeast1-a/clusters/my-cluster"
INFO 2024-08-15T05:43:03.317072502Z [resource.labels.containerName: istio-proxy] 2024-08-15T05:43:03.316881Z info stsserver Start listening on 127.0.0.1:15463
INFO 2024-08-15T05:43:03.317265023Z [resource.labels.containerName: istio-proxy] 2024-08-15T05:43:03.317124Z info platform detected is GCP
INFO 2024-08-15T05:43:03.318458742Z [resource.labels.containerName: istio-proxy] 2024-08-15T05:43:03.318285Z info Workload SDS socket not found. Starting Istio SDS Server
INFO 2024-08-15T05:43:03.318508033Z [resource.labels.containerName: istio-proxy] 2024-08-15T05:43:03.318321Z info CA Endpoint meshca.googleapis.com:443, provider GoogleCA
INFO 2024-08-15T05:43:03.318540842Z [resource.labels.containerName: istio-proxy] 2024-08-15T05:43:03.318310Z info Opening status port 15020
INFO 2024-08-15T05:43:03.319659752Z [resource.labels.containerName: istio-proxy] 2024-08-15T05:43:03.319553Z info ads All caches have been synced up in 18.479078ms, marking server ready
INFO 2024-08-15T05:43:03.320576522Z [resource.labels.containerName: istio-proxy] 2024-08-15T05:43:03.320469Z info xdsproxy Initializing with upstream address "meshconfig.googleapis.com:443" and cluster "cn-my-project-asia-southeast1-a-my-cluster"
INFO 2024-08-15T05:43:03.326444482Z [resource.labels.containerName: istio-proxy] 2024-08-15T05:43:03.326206Z info Pilot SAN: [meshconfig.googleapis.com]
INFO 2024-08-15T05:43:03.327752852Z [resource.labels.containerName: istio-proxy] 2024-08-15T05:43:03.327567Z info LRS for MCP is enabled
INFO 2024-08-15T05:43:03.328462842Z [resource.labels.containerName: istio-proxy] 2024-08-15T05:43:03.328306Z info Starting proxy agent
INFO 2024-08-15T05:43:03.328481212Z [resource.labels.containerName: istio-proxy] 2024-08-15T05:43:03.328331Z info starting
INFO 2024-08-15T05:43:03.328488422Z [resource.labels.containerName: istio-proxy] 2024-08-15T05:43:03.328373Z info Envoy command: [-c etc/istio/proxy/envoy-rev.json --drain-time-s 45 --drain-strategy immediate --local-address-ip-version v4 --file-flush-interval-msec 1000 --disable-hot-restart --allow-unknown-static-fields --log-format %Y-%m-%dT%T.%fZ %l envoy %n %g:%# %v thread=%t -l warning --component-log-level misc:error --concurrency 2]
INFO 2024-08-15T05:43:03.336131571Z [resource.labels.containerName: istio-proxy] 2024-08-15T05:43:03.335792Z info sds Starting SDS grpc server
INFO 2024-08-15T05:43:03.336158351Z [resource.labels.containerName: istio-proxy] 2024-08-15T05:43:03.335939Z info starting Http service at 127.0.0.1:15004
INFO 2024-08-15T05:43:03.422323294Z [resource.labels.containerName: istio-proxy] 2024-08-15T05:43:03.422160Z info token Prepared federated token request for aud "identitynamespace:my-project.svc.id.goog:https://container.googleapis.com/v1/projects/my-project/locations/asia-southeast1-a/clusters/my-cluster"
INFO 2024-08-15T05:43:03.435771723Z [resource.labels.containerName: istio-proxy] 2024-08-15T05:43:03.435623Z info token Prepared federated token request for aud "identitynamespace:my-project.svc.id.goog:https://container.googleapis.com/v1/projects/my-project/locations/asia-southeast1-a/clusters/my-cluster"
INFO 2024-08-15T05:43:03.478943780Z [resource.labels.containerName: istio-proxy] 2024-08-15T05:43:03.478781Z info token fetched federated token latency=56.396066ms ttl=3599
INFO 2024-08-15T05:43:03.481962759Z [resource.labels.containerName: istio-proxy] 2024-08-15T05:43:03.481629Z info googleca Cert created with GoogleCA asia-southeast1-a chain length 3
INFO 2024-08-15T05:43:03.481995519Z [resource.labels.containerName: istio-proxy] 2024-08-15T05:43:03.481744Z info cache generated new workload certificate latency=161.087877ms ttl=23h59m59.518258291s
INFO 2024-08-15T05:43:03.482001569Z [resource.labels.containerName: istio-proxy] 2024-08-15T05:43:03.481775Z info cache Root cert has changed, start rotating root cert
INFO 2024-08-15T05:43:03.482006279Z [resource.labels.containerName: istio-proxy] 2024-08-15T05:43:03.481793Z info ads XDS: Incremental Pushing ConnectedEndpoints:0 Version:
INFO 2024-08-15T05:43:03.482240759Z [resource.labels.containerName: istio-proxy] 2024-08-15T05:43:03.482118Z info cache returned workload trust anchor from cache ttl=23h59m59.517883411s
INFO 2024-08-15T05:43:03.482810239Z [resource.labels.containerName: istio-proxy] 2024-08-15T05:43:03.482563Z info token fetched federated token latency=46.758686ms ttl=3599
INFO 2024-08-15T05:43:03.531310485Z [resource.labels.containerName: istio-proxy] 2024-08-15T05:43:03.531089Z info token fetched access token latency=48.340526ms ttl=59m59.468912605s
INFO 2024-08-15T05:43:03.537333795Z [resource.labels.containerName: istio-proxy] 2024-08-15T05:43:03.537095Z info token fetched access token latency=58.073286ms ttl=59m59.462908625s
INFO 2024-08-15T05:43:03.537381655Z [resource.labels.containerName: istio-proxy] 2024-08-15T05:43:03.537212Z info xdsproxy connected to upstream XDS server: meshconfig.googleapis.com:443

I tried looking into the kube-system namespace of my cluster, where some new stuff were added for the service mesh: a DaemonSet “istio-cni-node”, a DaemonSet “snk”, a Deployment “mdp-controller”.

I checked the logs or istio-cni-node and mdp-controller and they seem to be fine.

But there were issues with the snk DaemonSet, 2 of its pods are restarting due to OOMKilled.

Name             Status         Restarts           Created on
snk-cqfg4    Running    95             Aug 13, 2024, 11:23:07 PM    
snk-455cv    Running    13             Aug 15, 2024, 7:38:19 AM 
snk-7fg4n    Running    0              Aug 15, 2024, 7:54:17 AM 
snk-95hc4    Running    0              Aug 15, 2024, 11:13:48 AM    
WARNING 2024-08-15T07:37:58Z [resource.labels.nodeName: gke-my-cluster-default-pool-6bd5b0f1-shpw] Memory cgroup out of memory: Killed process 709708 (snk) total-vm:2167296kB, anon-rss:29588kB, file-rss:36552kB, shmem-rss:0kB, UID:2692 pgtables:360kB oom_score_adj:999
DEFAULT 2024-08-15T07:37:58.013949Z [resource.labels.nodeName: gke-my-cluster-default-pool-6bd5b0f1-shpw] I0815 07:37:58.013791 2649 log_monitor.go:159] New status generated: &{Source:kernel-monitor Events:[{Severity:warn Timestamp:2024-08-15 07:37:57.564042014 +0000 UTC m=+25195.175669820 Reason:OOMKilling Message:Memory cgroup out of memory: Killed process 709708 (snk) total-vm:2167296kB, anon-rss:29588kB, file-rss:36552kB, shmem-rss:0kB, UID:2692 pgtables:360kB oom_score_adj:999}] Conditions:[{Type:KernelDeadlock Status:False Transition:2024-08-15 00:38:03.556889826 +0000 UTC m=+1.168517601 Reason:KernelHasNoDeadlock Message:kernel has no deadlock} {Type:ReadonlyFilesystem Status:False Transition:2024-08-15 00:38:03.556889935 +0000 UTC m=+1.168517721 Reason:FilesystemIsNotReadOnly Message:Filesystem is not read-only}]}

snk pod memory usage chart

As I understand, for a DaemonSet, each of its pod is on each node of the cluster. So I guess the OOM issue has something to do with the node.

From the logs of the snk pod, it seems to be gathering the IP addresses of the pods in the node.

And the 2 nodes hosting the 2 snk pods with a lot restarts due to OOMKilled have more pods than the other 2 nodes, mostly pods started from Kubernetes cronjobs in the cluster, which have istio sidecar injection disabled since istio sidecar prevents the pod of the cronjob from Completing and shutting down.

If I stop scheduling more pods to those 2 nodes, the OOMKilled issue seems to stop.

I tried to increase the memory limit of the snk DaemonSet to 100MiB but it get reverted back to 30MiB a few minutes later, I assumed by the managed service mesh.

I am not sure what to do from here as I just cannot stop pods from being scheduled into the 2 nodes as they run my business logic, or is the OOMKilled issue with the snk pods really the main cause of the problems of my pods cannot reach each other.

New contributor

tnd501 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.

Trang chủ Giới thiệu Sinh nhật bé trai Sinh nhật bé gái Tổ chức sự kiện Biểu diễn giải trí Dịch vụ khác Trang trí tiệc cưới Tổ chức khai trương Tư vấn dịch vụ Thư viện ảnh Tin tức - sự kiện Liên hệ Chú hề sinh nhật Trang trí YEAR END PARTY công ty Trang trí tất niên cuối năm Trang trí tất niên xu hướng mới nhất Trang trí sinh nhật bé trai Hải Đăng Trang trí sinh nhật bé Khánh Vân Trang trí sinh nhật Bích Ngân Trang trí sinh nhật bé Thanh Trang Thuê ông già Noel phát quà Biểu diễn xiếc khỉ Xiếc quay đĩa Dịch vụ tổ chức sự kiện 5 sao Thông tin về chúng tôi Dịch vụ sinh nhật bé trai Dịch vụ sinh nhật bé gái Sự kiện trọn gói Các tiết mục giải trí Dịch vụ bổ trợ Tiệc cưới sang trọng Dịch vụ khai trương Tư vấn tổ chức sự kiện Hình ảnh sự kiện Cập nhật tin tức Liên hệ ngay Thuê chú hề chuyên nghiệp Tiệc tất niên cho công ty Trang trí tiệc cuối năm Tiệc tất niên độc đáo Sinh nhật bé Hải Đăng Sinh nhật đáng yêu bé Khánh Vân Sinh nhật sang trọng Bích Ngân Tiệc sinh nhật bé Thanh Trang Dịch vụ ông già Noel Xiếc thú vui nhộn Biểu diễn xiếc quay đĩa Dịch vụ tổ chức tiệc uy tín Khám phá dịch vụ của chúng tôi Tiệc sinh nhật cho bé trai Trang trí tiệc cho bé gái Gói sự kiện chuyên nghiệp Chương trình giải trí hấp dẫn Dịch vụ hỗ trợ sự kiện Trang trí tiệc cưới đẹp Khởi đầu thành công với khai trương Chuyên gia tư vấn sự kiện Xem ảnh các sự kiện đẹp Tin mới về sự kiện Kết nối với đội ngũ chuyên gia Chú hề vui nhộn cho tiệc sinh nhật Ý tưởng tiệc cuối năm Tất niên độc đáo Trang trí tiệc hiện đại Tổ chức sinh nhật cho Hải Đăng Sinh nhật độc quyền Khánh Vân Phong cách tiệc Bích Ngân Trang trí tiệc bé Thanh Trang Thuê dịch vụ ông già Noel chuyên nghiệp Xem xiếc khỉ đặc sắc Xiếc quay đĩa thú vị
Trang chủ Giới thiệu Sinh nhật bé trai Sinh nhật bé gái Tổ chức sự kiện Biểu diễn giải trí Dịch vụ khác Trang trí tiệc cưới Tổ chức khai trương Tư vấn dịch vụ Thư viện ảnh Tin tức - sự kiện Liên hệ Chú hề sinh nhật Trang trí YEAR END PARTY công ty Trang trí tất niên cuối năm Trang trí tất niên xu hướng mới nhất Trang trí sinh nhật bé trai Hải Đăng Trang trí sinh nhật bé Khánh Vân Trang trí sinh nhật Bích Ngân Trang trí sinh nhật bé Thanh Trang Thuê ông già Noel phát quà Biểu diễn xiếc khỉ Xiếc quay đĩa

Kubernetes pods cannot reach each other after Google Cloud service mesh installation

I’m trying out the Cloud service mesh that was announced in Google Next 2024.

I’m trying it out on a standard cluster, with 4 nodes, following this guide:
https://cloud.google.com/service-mesh/docs/onboarding/provision-control-plane

During the first few days, everything seems fine. But then the pods in my namespace suddenly cannot reach each other anymore.

The pods are reaching each other via Kubernetes services of NodePort type: Pod A -> Kubernetes service B -> Deployment B -> Pod B.

At first I thought there were some issues during the installation process, so I followed https://cloud.google.com/service-mesh/docs/uninstall to uninstall and reinstall the service mesh. But after a few days, the above issue came back.

I have verified the status of the service mesh as per the guide that it is active and running

gcloud container fleet mesh describe
createTime: '2024-07-16T01:31:43.731699992Z'
membershipSpecs:
  projects/123456789123/locations/asia-southeast1/memberships/my-cluster:
    mesh:
      management: MANAGEMENT_AUTOMATIC
membershipStates:
  projects/123456789123/locations/asia-southeast1/memberships/my-cluster:
    servicemesh:
      conditions:
      - code: VPCSC_GA_SUPPORTED
        details: This control plane supports VPC-SC GA.
        documentationLink: http://cloud.google.com/service-mesh/docs/managed/vpc-sc
        severity: INFO
      controlPlaneManagement:
        details:
        - code: REVISION_READY
          details: 'Ready: asm-managed'
        implementation: TRAFFIC_DIRECTOR
        state: ACTIVE
      dataPlaneManagement:
        details:
        - code: OK
          details: Service is running.
        state: ACTIVE
    state:
      code: OK
      description: |-
        Revision ready for use: asm-managed.
        All Canonical Services have been reconciled successfully.
      updateTime: '2024-08-15T07:19:40.846768015Z'
name: projects/my-project/locations/global/features/servicemesh
resourceState:
  state: ACTIVE
spec: {}
updateTime: '2024-08-15T05:34:35.455135300Z'
kubectl describe controlplanerevision -n istio-system
Name:         asm-managed
Namespace:    istio-system
Labels:       app.kubernetes.io/created-by=mesh.googleapis.com
              istio.io/owned-by=mesh.googleapis.com
              mesh.cloud.google.com/managed-cni-enabled=true
Annotations:  mesh.cloud.google.com/proxy: {"managed":"true"}
              mesh.cloud.google.com/vpcsc-ga: false
API Version:  mesh.cloud.google.com/v1beta1
Kind:         ControlPlaneRevision
Metadata:
  Creation Timestamp:  2024-08-13T10:32:05Z
  Generation:          1
  Resource Version:    1101894861
  UID:                 cd4b26f5-d7d1-4a45-b220-2adcf618fa23
Spec:
  Channel:  regular
  Type:     managed_service
Status:
  Conditions:
    Last Transition Time:  2024-08-15T05:55:32Z
    Message:               The provisioning process has completed successfully
    Reason:                Provisioned
    Status:                True
    Type:                  Reconciled
    Last Transition Time:  2024-08-15T05:55:32Z
    Message:               Provisioning has finished
    Reason:                ProvisioningFinished
    Status:                True
    Type:                  ProvisioningFinished
    Last Transition Time:  2024-08-15T05:55:32Z
    Message:               Provisioning has not stalled
    Reason:                NotStalled
    Status:                False
    Type:                  Stalled
Events:                    <none>

I have verified that my namespace is labeled with istio-injected=enabled and can see the istio-proxy sidecars injected into every pod. The istio-proxy sidecar seems to be starting properly:

INFO 2024-08-15T05:42:54.174322288Z [resource.labels.containerName: istio-validation] 2024-08-15T05:42:54.173151Z info Starting iptables validation. This check verifies that iptables rules are properly established for the network.
INFO 2024-08-15T05:42:54.174372588Z [resource.labels.containerName: istio-validation] 2024-08-15T05:42:54.173236Z info Listening on 127.0.0.1:15001
INFO 2024-08-15T05:42:54.174376768Z [resource.labels.containerName: istio-validation] 2024-08-15T05:42:54.173423Z info Listening on 127.0.0.1:15006
INFO 2024-08-15T05:42:54.174380538Z [resource.labels.containerName: istio-validation] 2024-08-15T05:42:54.173810Z info Local addr 127.0.0.1:15006
INFO 2024-08-15T05:42:54.174403398Z [resource.labels.containerName: istio-validation] 2024-08-15T05:42:54.173823Z info Original addr 127.0.0.1: 15002
INFO 2024-08-15T05:42:54.174411368Z [resource.labels.containerName: istio-validation] 2024-08-15T05:42:54.173920Z info Validation passed, iptables rules established
INFO 2024-08-15T05:43:03.301899214Z [resource.labels.containerName: istio-proxy] 2024-08-15T05:43:03.301721Z info FLAG: --concurrency="0"
INFO 2024-08-15T05:43:03.301965054Z [resource.labels.containerName: istio-proxy] 2024-08-15T05:43:03.301757Z info FLAG: --domain="my-namespace.svc.cluster.local"
INFO 2024-08-15T05:43:03.301971444Z [resource.labels.containerName: istio-proxy] 2024-08-15T05:43:03.301764Z info FLAG: --help="false"
INFO 2024-08-15T05:43:03.301976384Z [resource.labels.containerName: istio-proxy] 2024-08-15T05:43:03.301768Z info FLAG: --log_as_json="false"
INFO 2024-08-15T05:43:03.301981364Z [resource.labels.containerName: istio-proxy] 2024-08-15T05:43:03.301771Z info FLAG: --log_caller=""
INFO 2024-08-15T05:43:03.301986134Z [resource.labels.containerName: istio-proxy] 2024-08-15T05:43:03.301774Z info FLAG: --log_output_level="default:info"
INFO 2024-08-15T05:43:03.301990984Z [resource.labels.containerName: istio-proxy] 2024-08-15T05:43:03.301777Z info FLAG: --log_rotate=""
INFO 2024-08-15T05:43:03.301995774Z [resource.labels.containerName: istio-proxy] 2024-08-15T05:43:03.301781Z info FLAG: --log_rotate_max_age="30"
INFO 2024-08-15T05:43:03.302000094Z [resource.labels.containerName: istio-proxy] 2024-08-15T05:43:03.301784Z info FLAG: --log_rotate_max_backups="1000"
INFO 2024-08-15T05:43:03.302004424Z [resource.labels.containerName: istio-proxy] 2024-08-15T05:43:03.301788Z info FLAG: --log_rotate_max_size="104857600"
INFO 2024-08-15T05:43:03.302009284Z [resource.labels.containerName: istio-proxy] 2024-08-15T05:43:03.301791Z info FLAG: --log_stacktrace_level="default:none"
INFO 2024-08-15T05:43:03.302014394Z [resource.labels.containerName: istio-proxy] 2024-08-15T05:43:03.301803Z info FLAG: --log_target="[stdout]"
INFO 2024-08-15T05:43:03.302018274Z [resource.labels.containerName: istio-proxy] 2024-08-15T05:43:03.301808Z info FLAG: --meshConfig="./etc/istio/config/mesh"
INFO 2024-08-15T05:43:03.302021414Z [resource.labels.containerName: istio-proxy] 2024-08-15T05:43:03.301811Z info FLAG: --outlierLogPath=""
INFO 2024-08-15T05:43:03.302024434Z [resource.labels.containerName: istio-proxy] 2024-08-15T05:43:03.301815Z info FLAG: --profiling="true"
INFO 2024-08-15T05:43:03.302027454Z [resource.labels.containerName: istio-proxy] 2024-08-15T05:43:03.301818Z info FLAG: --proxyComponentLogLevel="misc:error"
INFO 2024-08-15T05:43:03.302030504Z [resource.labels.containerName: istio-proxy] 2024-08-15T05:43:03.301822Z info FLAG: --proxyLogLevel="warning"
INFO 2024-08-15T05:43:03.302033784Z [resource.labels.containerName: istio-proxy] 2024-08-15T05:43:03.301826Z info FLAG: --serviceCluster="istio-proxy"
INFO 2024-08-15T05:43:03.302036864Z [resource.labels.containerName: istio-proxy] 2024-08-15T05:43:03.301838Z info FLAG: --stsPort="15463"
INFO 2024-08-15T05:43:03.302070544Z [resource.labels.containerName: istio-proxy] 2024-08-15T05:43:03.301905Z info FLAG: --templateFile=""
INFO 2024-08-15T05:43:03.302077534Z [resource.labels.containerName: istio-proxy] 2024-08-15T05:43:03.301916Z info FLAG: --tokenManagerPlugin="GoogleTokenExchange"
INFO 2024-08-15T05:43:03.302082334Z [resource.labels.containerName: istio-proxy] 2024-08-15T05:43:03.301928Z info FLAG: --vklog="0"
INFO 2024-08-15T05:43:03.302087684Z [resource.labels.containerName: istio-proxy] 2024-08-15T05:43:03.301940Z info Version 1.19.10-asm.6-491aae094c181ecc5467c78ddd3591b27a5c84cc-Clean
INFO 2024-08-15T05:43:03.302210354Z [resource.labels.containerName: istio-proxy] 2024-08-15T05:43:03.302115Z warn failed running ulimit command:
INFO 2024-08-15T05:43:03.302843944Z [resource.labels.containerName: istio-proxy] 2024-08-15T05:43:03.302379Z info Proxy role ips=[10.84.6.35] type=sidecar id=my-pod-c6c9d855d-fj9pz.my-namespace domain=my-namespace.svc.cluster.local
INFO 2024-08-15T05:43:03.302859354Z [resource.labels.containerName: istio-proxy] 2024-08-15T05:43:03.302485Z info Apply proxy config from env {"discoveryAddress":"meshconfig.googleapis.com:443","proxyMetadata":{"CA_PROVIDER":"GoogleCA","CA_ROOT_CA":"/etc/ssl/certs/ca-certificates.crt","CA_TRUSTANCHOR":"","FLEET_PROJECT_NUMBER":"123456789123","GCP_METADATA":"my-project|123456789123|my-cluster|asia-southeast1-a","OUTPUT_CERTS":"/etc/istio/proxy","PROXY_CONFIG_XDS_AGENT":"true","XDS_AUTH_PROVIDER":"gcp","XDS_ROOT_CA":"/etc/ssl/certs/ca-certificates.crt"},"meshId":"proj-123456789123"}
INFO 2024-08-15T05:43:03.302865444Z [resource.labels.containerName: istio-proxy] {}
INFO 2024-08-15T05:43:03.305151943Z [resource.labels.containerName: istio-proxy] 2024-08-15T05:43:03.305025Z info cpu limit detected as 2, setting concurrency
INFO 2024-08-15T05:43:03.305726793Z [resource.labels.containerName: istio-proxy] 2024-08-15T05:43:03.305632Z info Effective config: binaryPath: /usr/local/bin/envoy
INFO 2024-08-15T05:43:03.305742173Z [resource.labels.containerName: istio-proxy] concurrency: 2
INFO 2024-08-15T05:43:03.305748413Z [resource.labels.containerName: istio-proxy] configPath: ./etc/istio/proxy
INFO 2024-08-15T05:43:03.305754654Z [resource.labels.containerName: istio-proxy] controlPlaneAuthPolicy: MUTUAL_TLS
INFO 2024-08-15T05:43:03.305759594Z [resource.labels.containerName: istio-proxy] discoveryAddress: meshconfig.googleapis.com:443
INFO 2024-08-15T05:43:03.305765343Z [resource.labels.containerName: istio-proxy] drainDuration: 45s
INFO 2024-08-15T05:43:03.305769883Z [resource.labels.containerName: istio-proxy] meshId: proj-123456789123
INFO 2024-08-15T05:43:03.305774314Z [resource.labels.containerName: istio-proxy] proxyAdminPort: 15000
INFO 2024-08-15T05:43:03.305779434Z [resource.labels.containerName: istio-proxy] proxyMetadata:
INFO 2024-08-15T05:43:03.305824074Z [resource.labels.containerName: istio-proxy] CA_PROVIDER: GoogleCA
INFO 2024-08-15T05:43:03.305829003Z [resource.labels.containerName: istio-proxy] CA_ROOT_CA: /etc/ssl/certs/ca-certificates.crt
INFO 2024-08-15T05:43:03.305833774Z [resource.labels.containerName: istio-proxy] CA_TRUSTANCHOR: ""
INFO 2024-08-15T05:43:03.305838354Z [resource.labels.containerName: istio-proxy] FLEET_PROJECT_NUMBER: "123456789123"
INFO 2024-08-15T05:43:03.305844074Z [resource.labels.containerName: istio-proxy] GCP_METADATA: my-project|123456789123|my-cluster|asia-southeast1-a
INFO 2024-08-15T05:43:03.305848414Z [resource.labels.containerName: istio-proxy] OUTPUT_CERTS: /etc/istio/proxy
INFO 2024-08-15T05:43:03.305852854Z [resource.labels.containerName: istio-proxy] PROXY_CONFIG_XDS_AGENT: "true"
INFO 2024-08-15T05:43:03.305857503Z [resource.labels.containerName: istio-proxy] XDS_AUTH_PROVIDER: gcp
INFO 2024-08-15T05:43:03.305862883Z [resource.labels.containerName: istio-proxy] XDS_ROOT_CA: /etc/ssl/certs/ca-certificates.crt
INFO 2024-08-15T05:43:03.305867523Z [resource.labels.containerName: istio-proxy] serviceCluster: istio-proxy
INFO 2024-08-15T05:43:03.305872163Z [resource.labels.containerName: istio-proxy] statNameLength: 189
INFO 2024-08-15T05:43:03.305876963Z [resource.labels.containerName: istio-proxy] statusPort: 15020
INFO 2024-08-15T05:43:03.305881603Z [resource.labels.containerName: istio-proxy] terminationDrainDuration: 5s
INFO 2024-08-15T05:43:03.305886443Z [resource.labels.containerName: istio-proxy] tracing:
INFO 2024-08-15T05:43:03.305890974Z [resource.labels.containerName: istio-proxy] zipkin:
INFO 2024-08-15T05:43:03.305895783Z [resource.labels.containerName: istio-proxy] address: zipkin.istio-system:9411
INFO 2024-08-15T05:43:03.305900343Z [resource.labels.containerName: istio-proxy] {}
INFO 2024-08-15T05:43:03.305904854Z [resource.labels.containerName: istio-proxy] 2024-08-15T05:43:03.305656Z info JWT policy is third-party-jwt
INFO 2024-08-15T05:43:03.305910383Z [resource.labels.containerName: istio-proxy] 2024-08-15T05:43:03.305662Z info using credential fetcher of JWT type in my-project.svc.id.goog trust domain
INFO 2024-08-15T05:43:03.305915654Z [resource.labels.containerName: istio-proxy] 2024-08-15T05:43:03.305674Z info stsclient GKE_CLUSTER_URL is not set, fetched cluster URL from metadata server: "https://container.googleapis.com/v1/projects/my-project/locations/asia-southeast1-a/clusters/my-cluster"
INFO 2024-08-15T05:43:03.317072502Z [resource.labels.containerName: istio-proxy] 2024-08-15T05:43:03.316881Z info stsserver Start listening on 127.0.0.1:15463
INFO 2024-08-15T05:43:03.317265023Z [resource.labels.containerName: istio-proxy] 2024-08-15T05:43:03.317124Z info platform detected is GCP
INFO 2024-08-15T05:43:03.318458742Z [resource.labels.containerName: istio-proxy] 2024-08-15T05:43:03.318285Z info Workload SDS socket not found. Starting Istio SDS Server
INFO 2024-08-15T05:43:03.318508033Z [resource.labels.containerName: istio-proxy] 2024-08-15T05:43:03.318321Z info CA Endpoint meshca.googleapis.com:443, provider GoogleCA
INFO 2024-08-15T05:43:03.318540842Z [resource.labels.containerName: istio-proxy] 2024-08-15T05:43:03.318310Z info Opening status port 15020
INFO 2024-08-15T05:43:03.319659752Z [resource.labels.containerName: istio-proxy] 2024-08-15T05:43:03.319553Z info ads All caches have been synced up in 18.479078ms, marking server ready
INFO 2024-08-15T05:43:03.320576522Z [resource.labels.containerName: istio-proxy] 2024-08-15T05:43:03.320469Z info xdsproxy Initializing with upstream address "meshconfig.googleapis.com:443" and cluster "cn-my-project-asia-southeast1-a-my-cluster"
INFO 2024-08-15T05:43:03.326444482Z [resource.labels.containerName: istio-proxy] 2024-08-15T05:43:03.326206Z info Pilot SAN: [meshconfig.googleapis.com]
INFO 2024-08-15T05:43:03.327752852Z [resource.labels.containerName: istio-proxy] 2024-08-15T05:43:03.327567Z info LRS for MCP is enabled
INFO 2024-08-15T05:43:03.328462842Z [resource.labels.containerName: istio-proxy] 2024-08-15T05:43:03.328306Z info Starting proxy agent
INFO 2024-08-15T05:43:03.328481212Z [resource.labels.containerName: istio-proxy] 2024-08-15T05:43:03.328331Z info starting
INFO 2024-08-15T05:43:03.328488422Z [resource.labels.containerName: istio-proxy] 2024-08-15T05:43:03.328373Z info Envoy command: [-c etc/istio/proxy/envoy-rev.json --drain-time-s 45 --drain-strategy immediate --local-address-ip-version v4 --file-flush-interval-msec 1000 --disable-hot-restart --allow-unknown-static-fields --log-format %Y-%m-%dT%T.%fZ %l envoy %n %g:%# %v thread=%t -l warning --component-log-level misc:error --concurrency 2]
INFO 2024-08-15T05:43:03.336131571Z [resource.labels.containerName: istio-proxy] 2024-08-15T05:43:03.335792Z info sds Starting SDS grpc server
INFO 2024-08-15T05:43:03.336158351Z [resource.labels.containerName: istio-proxy] 2024-08-15T05:43:03.335939Z info starting Http service at 127.0.0.1:15004
INFO 2024-08-15T05:43:03.422323294Z [resource.labels.containerName: istio-proxy] 2024-08-15T05:43:03.422160Z info token Prepared federated token request for aud "identitynamespace:my-project.svc.id.goog:https://container.googleapis.com/v1/projects/my-project/locations/asia-southeast1-a/clusters/my-cluster"
INFO 2024-08-15T05:43:03.435771723Z [resource.labels.containerName: istio-proxy] 2024-08-15T05:43:03.435623Z info token Prepared federated token request for aud "identitynamespace:my-project.svc.id.goog:https://container.googleapis.com/v1/projects/my-project/locations/asia-southeast1-a/clusters/my-cluster"
INFO 2024-08-15T05:43:03.478943780Z [resource.labels.containerName: istio-proxy] 2024-08-15T05:43:03.478781Z info token fetched federated token latency=56.396066ms ttl=3599
INFO 2024-08-15T05:43:03.481962759Z [resource.labels.containerName: istio-proxy] 2024-08-15T05:43:03.481629Z info googleca Cert created with GoogleCA asia-southeast1-a chain length 3
INFO 2024-08-15T05:43:03.481995519Z [resource.labels.containerName: istio-proxy] 2024-08-15T05:43:03.481744Z info cache generated new workload certificate latency=161.087877ms ttl=23h59m59.518258291s
INFO 2024-08-15T05:43:03.482001569Z [resource.labels.containerName: istio-proxy] 2024-08-15T05:43:03.481775Z info cache Root cert has changed, start rotating root cert
INFO 2024-08-15T05:43:03.482006279Z [resource.labels.containerName: istio-proxy] 2024-08-15T05:43:03.481793Z info ads XDS: Incremental Pushing ConnectedEndpoints:0 Version:
INFO 2024-08-15T05:43:03.482240759Z [resource.labels.containerName: istio-proxy] 2024-08-15T05:43:03.482118Z info cache returned workload trust anchor from cache ttl=23h59m59.517883411s
INFO 2024-08-15T05:43:03.482810239Z [resource.labels.containerName: istio-proxy] 2024-08-15T05:43:03.482563Z info token fetched federated token latency=46.758686ms ttl=3599
INFO 2024-08-15T05:43:03.531310485Z [resource.labels.containerName: istio-proxy] 2024-08-15T05:43:03.531089Z info token fetched access token latency=48.340526ms ttl=59m59.468912605s
INFO 2024-08-15T05:43:03.537333795Z [resource.labels.containerName: istio-proxy] 2024-08-15T05:43:03.537095Z info token fetched access token latency=58.073286ms ttl=59m59.462908625s
INFO 2024-08-15T05:43:03.537381655Z [resource.labels.containerName: istio-proxy] 2024-08-15T05:43:03.537212Z info xdsproxy connected to upstream XDS server: meshconfig.googleapis.com:443

I tried looking into the kube-system namespace of my cluster, where some new stuff were added for the service mesh: a DaemonSet “istio-cni-node”, a DaemonSet “snk”, a Deployment “mdp-controller”.

I checked the logs or istio-cni-node and mdp-controller and they seem to be fine.

But there were issues with the snk DaemonSet, 2 of its pods are restarting due to OOMKilled.

Name             Status         Restarts           Created on
snk-cqfg4    Running    95             Aug 13, 2024, 11:23:07 PM    
snk-455cv    Running    13             Aug 15, 2024, 7:38:19 AM 
snk-7fg4n    Running    0              Aug 15, 2024, 7:54:17 AM 
snk-95hc4    Running    0              Aug 15, 2024, 11:13:48 AM    
WARNING 2024-08-15T07:37:58Z [resource.labels.nodeName: gke-my-cluster-default-pool-6bd5b0f1-shpw] Memory cgroup out of memory: Killed process 709708 (snk) total-vm:2167296kB, anon-rss:29588kB, file-rss:36552kB, shmem-rss:0kB, UID:2692 pgtables:360kB oom_score_adj:999
DEFAULT 2024-08-15T07:37:58.013949Z [resource.labels.nodeName: gke-my-cluster-default-pool-6bd5b0f1-shpw] I0815 07:37:58.013791 2649 log_monitor.go:159] New status generated: &{Source:kernel-monitor Events:[{Severity:warn Timestamp:2024-08-15 07:37:57.564042014 +0000 UTC m=+25195.175669820 Reason:OOMKilling Message:Memory cgroup out of memory: Killed process 709708 (snk) total-vm:2167296kB, anon-rss:29588kB, file-rss:36552kB, shmem-rss:0kB, UID:2692 pgtables:360kB oom_score_adj:999}] Conditions:[{Type:KernelDeadlock Status:False Transition:2024-08-15 00:38:03.556889826 +0000 UTC m=+1.168517601 Reason:KernelHasNoDeadlock Message:kernel has no deadlock} {Type:ReadonlyFilesystem Status:False Transition:2024-08-15 00:38:03.556889935 +0000 UTC m=+1.168517721 Reason:FilesystemIsNotReadOnly Message:Filesystem is not read-only}]}

snk pod memory usage chart

As I understand, for a DaemonSet, each of its pod is on each node of the cluster. So I guess the OOM issue has something to do with the node.

From the logs of the snk pod, it seems to be gathering the IP addresses of the pods in the node.

And the 2 nodes hosting the 2 snk pods with a lot restarts due to OOMKilled have more pods than the other 2 nodes, mostly pods started from Kubernetes cronjobs in the cluster, which have istio sidecar injection disabled since istio sidecar prevents the pod of the cronjob from Completing and shutting down.

If I stop scheduling more pods to those 2 nodes, the OOMKilled issue seems to stop.

I tried to increase the memory limit of the snk DaemonSet to 100MiB but it get reverted back to 30MiB a few minutes later, I assumed by the managed service mesh.

I am not sure what to do from here as I just cannot stop pods from being scheduled into the 2 nodes as they run my business logic, or is the OOMKilled issue with the snk pods really the main cause of the problems of my pods cannot reach each other.

New contributor

tnd501 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.

Trang chủ Giới thiệu Sinh nhật bé trai Sinh nhật bé gái Tổ chức sự kiện Biểu diễn giải trí Dịch vụ khác Trang trí tiệc cưới Tổ chức khai trương Tư vấn dịch vụ Thư viện ảnh Tin tức - sự kiện Liên hệ Chú hề sinh nhật Trang trí YEAR END PARTY công ty Trang trí tất niên cuối năm Trang trí tất niên xu hướng mới nhất Trang trí sinh nhật bé trai Hải Đăng Trang trí sinh nhật bé Khánh Vân Trang trí sinh nhật Bích Ngân Trang trí sinh nhật bé Thanh Trang Thuê ông già Noel phát quà Biểu diễn xiếc khỉ Xiếc quay đĩa Dịch vụ tổ chức sự kiện 5 sao Thông tin về chúng tôi Dịch vụ sinh nhật bé trai Dịch vụ sinh nhật bé gái Sự kiện trọn gói Các tiết mục giải trí Dịch vụ bổ trợ Tiệc cưới sang trọng Dịch vụ khai trương Tư vấn tổ chức sự kiện Hình ảnh sự kiện Cập nhật tin tức Liên hệ ngay Thuê chú hề chuyên nghiệp Tiệc tất niên cho công ty Trang trí tiệc cuối năm Tiệc tất niên độc đáo Sinh nhật bé Hải Đăng Sinh nhật đáng yêu bé Khánh Vân Sinh nhật sang trọng Bích Ngân Tiệc sinh nhật bé Thanh Trang Dịch vụ ông già Noel Xiếc thú vui nhộn Biểu diễn xiếc quay đĩa Dịch vụ tổ chức tiệc uy tín Khám phá dịch vụ của chúng tôi Tiệc sinh nhật cho bé trai Trang trí tiệc cho bé gái Gói sự kiện chuyên nghiệp Chương trình giải trí hấp dẫn Dịch vụ hỗ trợ sự kiện Trang trí tiệc cưới đẹp Khởi đầu thành công với khai trương Chuyên gia tư vấn sự kiện Xem ảnh các sự kiện đẹp Tin mới về sự kiện Kết nối với đội ngũ chuyên gia Chú hề vui nhộn cho tiệc sinh nhật Ý tưởng tiệc cuối năm Tất niên độc đáo Trang trí tiệc hiện đại Tổ chức sinh nhật cho Hải Đăng Sinh nhật độc quyền Khánh Vân Phong cách tiệc Bích Ngân Trang trí tiệc bé Thanh Trang Thuê dịch vụ ông già Noel chuyên nghiệp Xem xiếc khỉ đặc sắc Xiếc quay đĩa thú vị
Trang chủ Giới thiệu Sinh nhật bé trai Sinh nhật bé gái Tổ chức sự kiện Biểu diễn giải trí Dịch vụ khác Trang trí tiệc cưới Tổ chức khai trương Tư vấn dịch vụ Thư viện ảnh Tin tức - sự kiện Liên hệ Chú hề sinh nhật Trang trí YEAR END PARTY công ty Trang trí tất niên cuối năm Trang trí tất niên xu hướng mới nhất Trang trí sinh nhật bé trai Hải Đăng Trang trí sinh nhật bé Khánh Vân Trang trí sinh nhật Bích Ngân Trang trí sinh nhật bé Thanh Trang Thuê ông già Noel phát quà Biểu diễn xiếc khỉ Xiếc quay đĩa
Thiết kế website Thiết kế website Thiết kế website Cách kháng tài khoản quảng cáo Mua bán Fanpage Facebook Dịch vụ SEO Tổ chức sinh nhật