-
Notifications
You must be signed in to change notification settings - Fork 77
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Statefulset DNS resolution fails when exposing a service #1772
Comments
Hello Alba,
You're exposing a service, but if your intention is to have access to each
pod by name directly,
I would recommend you adding the "--headless" flag to the "skupper expose"
command, and,
instead of exposing the "service", you have to expose the "statefulset"
workload.
Thank you,
…On Wed, Nov 6, 2024 at 9:56 AM Alba Cañete Garrucho < ***@***.***> wrote:
*Describe the bug*
I do not understand how the DNS resolution works between two clusters that
execute statefulsets. When using a single cluster, I can access a Pod
through its name (deploying a headless svc), but cannot do the same when
using Skupper. Am I missing something?
*How To Reproduce*
1. Two clusters were created using kubeadm v1.29.5, containerd as CRI
and Flannel as CNI.
*Edge cluster*
***@***.***:~$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
agx14 Ready <none> 19d v1.29.5
agx15 Ready <none> 19d v1.29.5
rpi42 Ready control-plane 19d v1.29.5
*HPC cluster*
***@***.***:~$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
nano1 Ready control-plane 19d v1.29.5
workstation Ready <none> 19d v1.29.5
2. Create a namespace with the same name in each cluster.
*Edge cluster*
kubectl create ns compss
*HPC cluster*
kubectl create ns compss
3. Deploy Skupper in each namespace. Since I am deploying it on-prem
and with private IP addresses, I am using NodePort.
*Edge cluster*. 192.168.50.15 is the IP address of the agx14 node.
skupper init -n compss --ingress nodeport --ingress-host 192.168.50.15
*HPC cluster*. 192.168.50.61 is the IP address of the workstation node.
skupper init -n compss --ingress nodeport --ingress-host 192.168.50.61
4. Link namespaces
*Edge cluster*
skupper -n compss token create edge.token
*HPC cluster*. The edge.token file was copied to a machine on the HPC
cluster.
skupper -n compss link create edge.token
Output
***@***.***:~$ skupper -n compss link status
Links created from this site:
There are no links configured or connected
Current links from other sites that are connected:
Incoming link from site 472fdc04-1406-4281-bbf9-81f5e5ad3737 on namespace compss
***@***.***:~$ skupper -n compss link status
Links created from this site:
Link link1 is connected
Current links from other sites that are connected:
There are no connected links
5. Deploy test applications in both clusters
The YAML file for the StatefulSet that runs in the edge cluster is
apiVersion: v1
kind: Service
metadata:
name: compss-matmul-4fc9d6
namespace: compss
spec:
clusterIP: None # This makes it a headless service
selector:
app: compss
wf_id: compss-matmul-4fc9d6
ports:
- name: port-22
protocol: TCP
port: 22
targetPort: ssh-port
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: compss-matmul-4fc9d6-worker
namespace: compss
spec:
selector:
matchLabels:
app: compss
wf_id: compss-matmul-4fc9d6
pod-hostname: worker
serviceName: compss-matmul-4fc9d6
replicas: 2
ordinals:
start: 2
template:
metadata:
labels:
app: compss
wf_id: compss-matmul-4fc9d6
pod-hostname: worker
spec:
subdomain: compss-matmul-4fc9d6
dnsConfig:
searches:
- compss-matmul-4fc9d6.compss.svc.cluster.local
containers:
- name: worker
image: albabsc/compss-matmul:verge-0.1.8
command: [ "/usr/sbin/sshd", "-D" ]
resources:
limits:
memory: 2G
cpu: 4
ports:
- containerPort: 22
name: ssh-port
The YAML file for the StatefulSet that runs in the HPC cluster is
apiVersion: v1
kind: Service
metadata:
name: compss-matmul-4fc9d6
namespace: compss
spec:
clusterIP: None # This makes it a headless service
selector:
app: compss
wf_id: compss-matmul-4fc9d6
ports:
- name: port-22
protocol: TCP
port: 22
targetPort: ssh-port
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: compss-matmul-4fc9d6-worker
namespace: compss
spec:
selector:
matchLabels:
app: compss
wf_id: compss-matmul-4fc9d6
pod-hostname: worker
serviceName: compss-matmul-4fc9d6
replicas: 2
template:
metadata:
labels:
app: compss
wf_id: compss-matmul-4fc9d6
pod-hostname: worker
spec:
subdomain: compss-matmul-4fc9d6
dnsConfig:
searches:
- compss-matmul-4fc9d6.compss.svc.cluster.local
containers:
- name: worker
image: albabsc/compss-matmul:verge-0.1.8
command: [ "/usr/sbin/sshd", "-D" ]
resources:
limits:
memory: 2G
cpu: 4
ports:
- containerPort: 22
name: ssh-port
6. Ensure connection among pods of the same cluster
*Edge cluster*
***@***.***:~$ kubectl -n compss get pods
NAME READY STATUS RESTARTS AGE
compss-matmul-4fc9d6-worker-0 1/1 Running 0 4m19s
compss-matmul-4fc9d6-worker-1 1/1 Running 0 4m18s
skupper-router-6895bb6f95-88hnj 2/2 Running 0 55m
skupper-service-controller-559ddbdd56-wnsvh 1/1 Running 0 55m
***@***.***:~$ kubectl -n compss exec -it compss-matmul-4fc9d6-worker-0 -- bash
***@***.***:/# ssh compss-matmul-4fc9d6-worker-1
Welcome to Ubuntu 20.04.6 LTS (GNU/Linux 6.8.0-47-generic x86_64)
* Documentation: https://help.ubuntu.com
* Management: https://landscape.canonical.com
* Support: https://ubuntu.com/pro
This system has been minimized by removing packages and content that are
not required on a system that users do not log into.
To restore this content, you can run the 'unminimize' command.
Last login: Wed Nov 6 12:33:29 2024 from 10.244.1.160
***@***.***:~#
*HPC cluster*
***@***.***:~$ kubectl -n compss get pods
NAME READY STATUS RESTARTS AGE
compss-matmul-4fc9d6-worker-2 1/1 Running 0 4m48s
compss-matmul-4fc9d6-worker-3 1/1 Running 0 4m47s
skupper-router-748c487879-gvpxg 2/2 Running 0 56m
skupper-service-controller-6f69b974bd-grgzc 1/1 Running 0 56m
***@***.***:~$ kubectl -n compss exec -ti compss-matmul-4fc9d6-worker-2 -- bash
***@***.***:/# ssh compss-matmul-4fc9d6-worker-3
Welcome to Ubuntu 20.04.6 LTS (GNU/Linux 5.10.192-tegra aarch64)
* Documentation: https://help.ubuntu.com
* Management: https://landscape.canonical.com
* Support: https://ubuntu.com/pro
This system has been minimized by removing packages and content that are
not required on a system that users do not log into.
To restore this content, you can run the 'unminimize' command.
Last login: Wed Nov 6 12:49:41 2024 from 10.244.2.97
***@***.***:~#
7. Expose service with Skupper
Command executed in the *edge cluster*
skupper -n compss expose service compss-matmul-4fc9d6 --port 22 --address compss-matmul-4fc9d6
Check the service is correctly created
***@***.***:~$ skupper -n compss service status
Services exposed through Skupper:
╰─ compss-matmul-4fc9d6:22 (tcp)
***@***.***:~$ skupper -n compss service status
Services exposed through Skupper:
╰─ compss-matmul-4fc9d6:22 (tcp)
8. DNS resolution no longer works
When trying to ssh between two pods of the same cluster (e.g edge)
***@***.***:~$ kubectl -n compss get pods
NAME READY STATUS RESTARTS AGE
compss-matmul-4fc9d6-worker-0 1/1 Running 0 4m19s
compss-matmul-4fc9d6-worker-1 1/1 Running 0 4m18s
skupper-router-6895bb6f95-88hnj 2/2 Running 0 55m
skupper-service-controller-559ddbdd56-wnsvh 1/1 Running 0 55m
***@***.***:~$ kubectl -n compss exec -it compss-matmul-4fc9d6-worker-0 -- bash
***@***.***:/# ssh compss-matmul-4fc9d6-worker-1
ssh: Could not resolve hostname compss-matmul-4fc9d6-worker-1: No address associated with hostname
***@***.***:/# ssh compss-matmul-4fc9d6-worker-2
ssh: Could not resolve hostname compss-matmul-4fc9d6-worker-2: No address associated with hostname
***@***.***:/#
*Expected behavior*
I would like for every Pod of a StatefulSet to be accessed through their
Pod names, or to know the name I have to use. And to know if the name is
different when a Pod in cluster 1 want to access a Pod in cluster 2.
*Environment details*
- Skupper CLI: 1.8.1
- Skupper Operator (if applicable): none
- Platform: kubernetes
*Additional context*
Pods have the the following /etc/resolv.conf file
search compss.svc.cluster.local svc.cluster.local cluster.local lan compss-matmul-4fc9d6.compss.svc.cluster.local
nameserver 10.96.0.10
options ndots:5
—
Reply to this email directly, view it on GitHub
<#1772>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABYML4SDQ7QQS2RJUTFB7CTZ7IGXFAVCNFSM6AAAAABRIYZVSWVHI2DSMVQWIX3LMV43ASLTON2WKOZSGYZTQMBRHEZTKOI>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
Hello @fgiorgetti, thanks for the quick answer :) I have also tried it and have not been able to make it work. What I have done is:
|
Since you're deploying the same statefulset and service on both namespaces,
could you try to
modify their names in one of the clusters, possibly just changing the
suffix in one of them?
This way, skupper will basically create different statefulset proxies and
headless services, and
will avoid name clashes with the generated resources on each
cluster/namespace.
Suppose you modify the suffix in one of your clusters, from 4fc9d6 to
4fc9d7, then you should be
able to reach your distinct pods, using the following names:
compss-matmul-4fc9d6-worker-0.compss-matmul-4fc9d6
compss-matmul-4fc9d6-worker-1.compss-matmul-4fc9d6
compss-matmul-4fc9d7-worker-0.compss-matmul-4fc9d7
compss-matmul-4fc9d7-worker-1.compss-matmul-4fc9d7
Basically on the remote namespaces, Skupper will create a statefulset and a
headless service that
have the same name (from the originally exposed statefulset) on the other
cluster/namespace. So if
your statefulsets and headless services have the same names on both sides,
I believe it won't work
as expected.
…On Wed, Nov 6, 2024 at 10:29 AM Alba Cañete Garrucho < ***@***.***> wrote:
Hello @fgiorgetti <https://github.com/fgiorgetti>, thanks for the quick
answer :)
I have also tried it and have not been able to make it work. What I have
done is:
1. Unexpose the service
skupper -n compss unexpose service compss-matmul-4fc9d6 --address
compss-matmul-4fc9d6
and check
***@***.***:~$ skupper -n compss service status
No services defined
``
2. Exposed the statefulset in the *edge cluster* executing the
following command
skupper -n compss expose statefulset compss-matmul-4fc9d6-worker
--headless --port 22
Now, two new proxy pods and a svc are created
***@***.***:~$ kubectl -n compss get pods
NAME READY STATUS RESTARTS AGE
compss-matmul-4fc9d6-proxy-0 1/1 Running 0 7m49s
compss-matmul-4fc9d6-proxy-1 1/1 Running 0 7m46s
compss-matmul-4fc9d6-worker-2 1/1 Running 0 57m
compss-matmul-4fc9d6-worker-3 1/1 Running 0 57m
skupper-router-748c487879-gvpxg 2/2 Running 0 108m
skupper-service-controller-6f69b974bd-grgzc 1/1 Running 0 108m
***@***.***:~$ kubectl -n compss get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
compss-matmul-4fc9d6 ClusterIP None <none> 22/TCP 55m
compss-matmul-4fc9d6-proxy ClusterIP None <none> 22/TCP 5m23s
skupper-router NodePort 10.96.210.62 <none> 55671:30581/TCP,45671:30524/TCP,8081:32381/TCP 106m
skupper-router-local ClusterIP 10.98.178.162 <none> 5671/TCP 106m
3. Tried to connect *to* a worker in the *edge cluster* *from* a
worker in the *HPC cluster*
***@***.***:~$ kubectl -n compss exec -it compss-matmul-4fc9d6-worker-0 -- bash
***@***.***:/# ssh compss-matmul-4fc9d6-worker-2
ssh: Could not resolve hostname compss-matmul-4fc9d6-worker-2: No address associated with hostname
Also tried connecting to the newly created proxy pods
***@***.***:/# ssh compss-matmul-4fc9d6-proxy-0
ssh: Could not resolve hostname compss-matmul-4fc9d6-proxy-0: No address associated with hostname
—
Reply to this email directly, view it on GitHub
<#1772 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABYML4SPGJ5TVSL67TR2HO3Z7IKS3AVCNFSM6AAAAABRIYZVSWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDINJZG42TKNZYGU>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
hello @fgiorgetti, I have modified the YAML files and now they are Edge cluster
HPC cluster
I have deployed both YAMLs and executed the following command on the edge cluster:
When the statefulset in the edge cluster gets exposed, the new pods appear in the HPC cluster
Now the DNS resolution is ok, but ssh fails to create the connection. Do you know if it has to do with the implementation of Skupper's security? The docker image has the ssh keys inside and I can ssh to pods in the same cluster
Pod in the same cluster
|
Hello Alba,
Looking at your statefulset, I noticed it has the following specification:
ordinals:
start: 2
Do you really need to set the start index for your worker pods?
If you remove it, I believe it should work for you, as the remote proxy
pods created by Skupper will have the
appropriate names and the local proxy pods (on the same cluster and
namespace of your exposed statefulset) will
target the correct local pods as well.
Otherwise, the worker pods are created as compss-matmul-4fc9d61-worker-2
and compss-matmul-4fc9d61-worker-3,
which is currently not supported as proxy pods won't work properly.
In case you can remove the ordinals.start definition, then you should be
able to access your services using: <pod-name>.<*service-name*>,
with *service-name* being the value of *spec.serviceName* from your
statefulset, i.e:
- compss-matmul-4fc9d61-worker-0.compss-matmul-4fc9d61
- compss-matmul-4fc9d61-worker-1.compss-matmul-4fc9d61
…On Sat, Nov 9, 2024 at 1:44 PM Alba Cañete Garrucho < ***@***.***> wrote:
hello @fgiorgetti <https://github.com/fgiorgetti>, I have modified the
YAML files and now they are
Edge cluster
apiVersion: v1
kind: Service
metadata:
name: compss-matmul-4fc9d61
namespace: compss
spec:
clusterIP: None # This makes it a headless service
selector:
app: compss
wf_id: compss-matmul-4fc9d61
ports:
- name: port-22
protocol: TCP
port: 22
targetPort: ssh-port
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: compss-matmul-4fc9d61-worker
namespace: compss
spec:
selector:
matchLabels:
app: compss
wf_id: compss-matmul-4fc9d61
pod-hostname: worker
serviceName: compss-matmul-4fc9d61
replicas: 2
ordinals:
start: 2
template:
metadata:
labels:
app: compss
wf_id: compss-matmul-4fc9d61
pod-hostname: worker
spec:
subdomain: compss-matmul-4fc9d61
dnsConfig:
searches:
- compss-matmul-4fc9d61.compss.svc.cluster.local
containers:
- name: worker
image: albabsc/compss-matmul:verge-0.1.8
command: [ "/usr/sbin/sshd", "-D" ]
resources:
limits:
memory: 2G
cpu: 4
ports:
- containerPort: 22
name: ssh-port
HPC cluster
apiVersion: v1
kind: Service
metadata:
name: compss-matmul-4fc9d6
namespace: compss
spec:
clusterIP: None # This makes it a headless service
selector:
app: compss
wf_id: compss-matmul-4fc9d6
ports:
- name: port-22
protocol: TCP
port: 22
targetPort: ssh-port
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: compss-matmul-4fc9d6-worker
namespace: compss
spec:
selector:
matchLabels:
app: compss
wf_id: compss-matmul-4fc9d6
pod-hostname: worker
serviceName: compss-matmul-4fc9d6
replicas: 2
template:
metadata:
labels:
app: compss
wf_id: compss-matmul-4fc9d6
pod-hostname: worker
spec:
subdomain: compss-matmul-4fc9d6
dnsConfig:
searches:
- compss-matmul-4fc9d6.compss.svc.cluster.local
containers:
- name: worker
image: albabsc/compss-matmul:verge-0.1.8
command: [ "/usr/sbin/sshd", "-D" ]
resources:
limits:
memory: 2G
cpu: 4
ports:
- containerPort: 22
name: ssh-port
I have deployed both YAMLs and executed the following command on the edge
cluster:
***@***.***:~$ skupper -n compss expose statefulset compss-matmul-4fc9d61-worker --headless --port 22
*When the statefulset in the edge cluster gets exposed, the new pods
appear in the HPC cluster*
***@***.***:~$ kubectl -n compss get pods
NAME READY STATUS RESTARTS AGE
compss-matmul-4fc9d6-worker-0 1/1 Running 0 30s
compss-matmul-4fc9d6-worker-1 1/1 Running 0 29s
compss-matmul-4fc9d61-worker-0 1/1 Running 0 9s
compss-matmul-4fc9d61-worker-1 1/1 Running 0 7s
skupper-router-f88bff6f9-4mskr 2/2 Running 0 98s
skupper-service-controller-655bf9fbf8-8gdln 1/1 Running 0 98s
*Now the DNS resolution is ok, but ssh fails to create the connection. Do
you know if it has to do with the implementation of Skupper's security? The
docker image has the ssh keys inside and I can ssh to pods in the same
cluster*
Pod in a different cluster
***@***.***:~$ kubectl -n compss exec -ti compss-matmul-4fc9d6-worker-0 -- bash
***@***.***:/# nslookup compss-matmul-4fc9d61-worker-0.compss-matmul-4fc9d61
;; Got recursion not available from 10.96.0.10
Server: 10.96.0.10
Address: 10.96.0.10#53
Name: compss-matmul-4fc9d61-worker-0.compss-matmul-4fc9d61.compss.svc.cluster.local
Address: 10.244.1.184
;; Got recursion not available from 10.96.0.10
***@***.***:/# ssh compss-matmul-4fc9d61-worker-0.compss-matmul-4fc9d61
ssh: connect to host compss-matmul-4fc9d61-worker-0.compss-matmul-4fc9d61 port 22: Connection refused
Pod in the same cluster
***@***.***:~$ kubectl -n compss exec -ti compss-matmul-4fc9d6-worker-0 -- bash
***@***.***:/# ssh compss-matmul-4fc9d6-worker-1
Welcome to Ubuntu 20.04.6 LTS (GNU/Linux 6.8.0-47-generic x86_64)
* Documentation: https://help.ubuntu.com
* Management: https://landscape.canonical.com
* Support: https://ubuntu.com/pro
This system has been minimized by removing packages and content that are
not required on a system that users do not log into.
To restore this content, you can run the 'unminimize' command.
Last login: Sat Nov 9 16:39:10 2024 from 10.244.1.182
—
Reply to this email directly, view it on GitHub
<#1772 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABYML4V5XNNMO4BWNTZ4EJ3Z7Y3YBAVCNFSM6AAAAABRIYZVSWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDINRWGI4DAMJYGI>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Hello @fgiorgetti, I deployed it as you mention but I still get connection refused
Pods
|
Hello again @fgiorgetti :) With further debugging I have realized that the IP of the Pod and the IP resolved by the DNS with Skupper are different:
IP resolved by DNS: 10.244.1.227
with ssh:
|
Further debugging, even though I executed the command
|
Some clusters might have securitycontextconstraints preventing pods from
running as root,
therefore they won't be able to bind system ports (<1024). I am not sure if
that is what
you're facing, but make sure the worker pods created by Skupper on the
remote cluster do
not have any issue binding port 22, for example:
$ kubectl logs compss-matmul-4fc9d61-worker-0 | grep denied | tail -1
2024-11-20 14:12:02.035365 +0000 FLOW_LOG (info) LOG [hlGYm:11628922] BEGIN
END parent=hlGYm:0 logSeverity=3 logText=LOG_ROUTER: Listener ingress:22:
proactor listener error on 0.0.0.0:22: proton:io (Permission denied -
listen on 0.0.0.0:22) sourceFile=/build/src/adaptors/adaptor_listener.c
sourceLine=172
This could indicate that the pods created by Skupper are unable to bind
port 22.
Anyway, I have made some small modifications to your original statefulset
to use port 2222
instead, as a way to ensure the system ports are not the root cause.
https://gist.github.com/fgiorgetti/953722df46088a98b2f5f49d6a22ec93
I have deployed the Statefulset above (basically yours with a custom image)
to a local cluster named west.
Then I linked the west cluster to a remote cluster I am calling east.
At this point, the statefulset is running on the west cluster and I have
not yet exposed it to the Skupper network.
Here is how it looks from the west cluster:
west $ kubectl get pod -o wide
NAME READY STATUS RESTARTS
AGE IP NODE NOMINATED NODE READINESS GATES
compss-matmul-4fc9d61-worker-0 1/1 Running 0
8m47s 10.244.5.213 minikube <none> <none>
compss-matmul-4fc9d61-worker-1 1/1 Running 0
8m23s 10.244.5.214 minikube <none> <none>
west $ kubectl get service compss-matmul-4fc9d61 -o wide
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S)
AGE SELECTOR
compss-matmul-4fc9d61 ClusterIP None <none> 2222/TCP
9m6s app=compss,wf_id=compss-matmul-4fc9d61
Running an SSH client pod on the "west" cluster, where the SSHD worker pods
are actually running, I can establish a connection
(note that the IP returned is the pod ip and that skupper does not
manipulate IPs or DNS):
west $ kubectl run ssh-client -it --image quay.io/fgiorgetti/rhel9-sshd --
bash
***@***.*** /]# ping
compss-matmul-4fc9d61-worker-0.compss-matmul-4fc9d61
PING
compss-matmul-4fc9d61-worker-0.compss-matmul-4fc9d61.fg1.svc.cluster.local
(10.244.5.213) 56(84) bytes of data.
64 bytes from
compss-matmul-4fc9d61-worker-0.compss-matmul-4fc9d61.fg1.svc.cluster.local
(10.244.5.213): icmp_seq=1 ttl=64 time=0.050 ms
***@***.*** /]# ssh -p 2222
***@***.***
The authenticity of host
'[compss-matmul-4fc9d61-worker-0.compss-matmul-4fc9d61]:2222
([10.244.5.213]:2222)' can't be established.
ED25519 key fingerprint is
SHA256:lyyTCcGkE2kYBaaIFUzPVYD1vmT4Si/S7mTUPiNTJAs.
This key is not known by any other names
Are you sure you want to continue connecting (yes/no/[fingerprint])? yes
Warning: Permanently added
'[compss-matmul-4fc9d61-worker-0.compss-matmul-4fc9d61]:2222' (ED25519) to
the list of known hosts.
***@***.*** ~]#
Skupper has not been involved so far.
Now, let's expose the statefulset running on the west cluster and try to
access its worker pods from the remote cluster (east).
west $ skupper expose statefulset compss-matmul-4fc9d61-worker --port 2222
--headless
statefulset compss-matmul-4fc9d61-worker exposed as compss-matmul-4fc9d61
Looking at the "east" cluster now:
east $ kubectl get pod -o wide
NAME READY STATUS RESTARTS
AGE IP NODE NOMINATED NODE READINESS GATES
compss-matmul-4fc9d61-worker-0 1/1 Running 0
24s 172.17.44.224 10.240.0.16 <none> <none>
compss-matmul-4fc9d61-worker-1 1/1 Running 0
21s 172.17.59.174 10.240.0.4 <none> <none>
east $ kubectl get service compss-matmul-4fc9d61 -o wide
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S)
AGE SELECTOR
compss-matmul-4fc9d61 ClusterIP None <none> 2222/TCP
39s internal.skupper.io/service=compss-matmul-4fc9d61
Now that everything is ready, let me run the ssh-client there.
Observe that the IP is correct and that I am able to access the SSH server:
east $ kubectl run ssh-client -it --image quay.io/fgiorgetti/rhel9-sshd --
bash
***@***.*** /]# ping
compss-matmul-4fc9d61-worker-0.compss-matmul-4fc9d61
PING
compss-matmul-4fc9d61-worker-0.compss-matmul-4fc9d61.fg1.svc.cluster.local
(172.17.44.224) 56(84) bytes of data.
64 bytes from
compss-matmul-4fc9d61-worker-0.compss-matmul-4fc9d61.fg1.svc.cluster.local
(172.17.44.224): icmp_seq=1 ttl=63 time=0.110 ms
***@***.*** /]# ssh -p 2222
***@***.***
The authenticity of host
'[compss-matmul-4fc9d61-worker-0.compss-matmul-4fc9d61]:2222
([172.17.44.224]:2222)' can't be established.
ED25519 key fingerprint is
SHA256:lyyTCcGkE2kYBaaIFUzPVYD1vmT4Si/S7mTUPiNTJAs.
This key is not known by any other names
Are you sure you want to continue connecting (yes/no/[fingerprint])? yes
Warning: Permanently added
'[compss-matmul-4fc9d61-worker-0.compss-matmul-4fc9d61]:2222' (ED25519) to
the list of known hosts.
Last login: Wed Nov 20 14:53:37 2024 from 10.244.5.215
***@***.*** ~]#
Would you be able to try again using the modified YAMLs (with port 2222
instead)?
…On Tue, Nov 19, 2024 at 12:57 PM Alba Cañete Garrucho < ***@***.***> wrote:
Further debugging, even though I executed the command skupper -n compss
expose statefulset compss-matmul-4fc9d61-worker --headless --port 22,
when trying to list the services exposed I get nothing...
***@***.***:~$ skupper -n compss service status
No services defined
—
Reply to this email directly, view it on GitHub
<#1772 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABYML4XOPC5OEYP6PWA6AP32BNNWLAVCNFSM6AAAAABRIYZVSWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIOBWGEYDOMBYHE>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Hello @fgiorgetti ,
which you can get executing |
Describe the bug
I do not understand how the DNS resolution works between two clusters that execute statefulsets. When using a single cluster, I can access a Pod through its name (deploying a headless svc), but cannot do the same when using Skupper. Am I missing something?
How To Reproduce
Two clusters were created using kubeadm v1.29.5, containerd as CRI and Flannel as CNI.
Edge cluster
HPC cluster
Create a namespace with the same name in each cluster.
Edge cluster
HPC cluster
Deploy Skupper in each namespace. Since I am deploying it on-prem and with private IP addresses, I am using NodePort.
Edge cluster. 192.168.50.15 is the IP address of the agx14 node.
HPC cluster. 192.168.50.61 is the IP address of the workstation node.
Link namespaces
Edge cluster
HPC cluster. The edge.token file was copied to a machine on the HPC cluster.
Output
Deploy test applications in both clusters
The YAML file for the StatefulSet that runs in the edge cluster is
The YAML file for the StatefulSet that runs in the HPC cluster is
Ensure connection among pods of the same cluster
Edge cluster
HPC cluster
Expose service with Skupper
Command executed in the edge cluster
Check the service is correctly created
DNS resolution no longer works
When trying to ssh between two pods of the same cluster (e.g edge)
Expected behavior
I would like for every Pod of a StatefulSet to be accessed through their Pod names, or to know the name I have to use. And to know if the name is different when a Pod in cluster 1 want to access a Pod in cluster 2.
Environment details
Additional context
Pods have the the following
/etc/resolv.conf
fileThe text was updated successfully, but these errors were encountered: