Cause and solution of timeout when ssh_connect() in libssh is executed with ssm-over-ssh as ProxyCommand
awsspecWhen I connect to RDS in Private subnet, I found that Sequel Ace running in Sandbox Mode can’t do SSH tunneling with ssh-over-ssm as ProxyCommand, so I tried TablePlus introduced in this Issue, but it also failed.
SSH connection to EC2 instance through Session Manager with ssh-over-ssm - sambaiz-net
I enabled SSH Debug Log from Help, and found that libssh was used. So I tried executing ssh_connect() but it timed out after the connection.
$ cat libsshtest.c
#include <libssh/libssh.h>
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
void connect_through_ssm() {
ssh_session session = ssh_new();
if (session == NULL) {
fprintf(stderr, "failed to create a ssh_session for ssm\n");
return;
}
int timeout_sec = 30;
ssh_options_set(session, SSH_OPTIONS_TIMEOUT, &timeout_sec);
int remote_port = 22;
ssh_options_set(session, SSH_OPTIONS_HOST, "i-03b79ba2921757fe7");
ssh_options_set(session, SSH_OPTIONS_PORT, &remote_port);
int verbosity = SSH_LOG_FUNCTIONS;
ssh_options_set(session, SSH_OPTIONS_LOG_VERBOSITY, &verbosity);
if (ssh_connect(session) != SSH_OK) {
fprintf(stderr, "failed to connect to remote host through ssm: %s\n", ssh_get_error(session));
ssh_free(session);
return;
}
ssh_disconnect(session);
ssh_free(session);
}
int main() {
connect_through_ssm();
}
$ brew install libssh
$ clang -I /opt/homebrew/include -I ext -L /opt/homebrew/lib -lssh libsshtest.c
$ ./a.out
failed to connect to remote host through ssm: Timeout connecting to i-03b79ba2921757fe7
Set SSH_OPTIONS_LOG_VERBOSITY to output detailed logs.
int verbosity = SSH_LOG_FUNCTIONS;
ssh_options_set(session, SSH_OPTIONS_LOG_VERBOSITY, &verbosity);
From them, I found that ProxyCommand was executed and Banner was returned.
[2023/07/17 13:54:10.223126, 2] ssh_socket_connect_proxycommand: Executing proxycommand 'exec sh -c "~/.ssh/ssh-ssm.sh %h %r"'
[2023/07/17 13:54:10.224973, 2] ssh_socket_connect_proxycommand: ProxyCommand connection pipe: [3,4]
[2023/07/17 13:54:10.225031, 1] socket_callback_connected: Socket connection callback: 1 (0)
[2023/07/17 13:54:10.225045, 3] ssh_client_connection_callback: session_state=2
[2023/07/17 13:54:10.225066, 3] ssh_socket_unbuffered_write: Enabling POLLOUT for socket
[2023/07/17 13:54:10.225073, 4] ssh_socket_unbuffered_write: wrote 23
[2023/07/17 13:54:10.225079, 2] ssh_connect: Socket connecting, now waiting for the callbacks to work
[2023/07/17 13:54:10.225085, 3] ssh_connect: Actual timeout : 30000
[2023/07/17 13:54:10.225109, 4] ssh_socket_pollcallback: Poll callback on socket 4 (POLLOUT ), out buffer 0
[2023/07/17 13:54:15.126839, 4] ssh_socket_pollcallback: Poll callback on socket 4 (POLLIN ), out buffer 0
[2023/07/17 13:54:15.127027, 4] ssh_socket_unbuffered_read: read 336
[2023/07/17 13:54:15.127048, 3] callback_receive_banner: ssh_protocol_version_exchange: Updates Information Summary: available
[2023/07/17 13:54:15.127059, 3] callback_receive_banner: ssh_protocol_version_exchange: 1 Security notice(s)
[2023/07/17 13:54:15.127070, 3] callback_receive_banner: ssh_protocol_version_exchange: 1 Important Security notice(s)
[2023/07/17 13:54:15.127080, 3] callback_receive_banner: ssh_protocol_version_exchange:
[2023/07/17 13:54:15.127090, 3] callback_receive_banner: ssh_protocol_version_exchange: , #_
[2023/07/17 13:54:15.127099, 3] callback_receive_banner: ssh_protocol_version_exchange: ~\_ ####_ Amazon Linux 2023
[2023/07/17 13:54:15.127109, 3] callback_receive_banner: ssh_protocol_version_exchange: ~~ \_#####\
[2023/07/17 13:54:15.127118, 3] callback_receive_banner: ssh_protocol_version_exchange: ~~ \###|
[2023/07/17 13:54:15.127127, 3] callback_receive_banner: ssh_protocol_version_exchange: ~~ \#/ ___ https://aws.amazon.com/linux/amazon-linux-2023
[2023/07/17 13:54:15.127498, 3] callback_receive_banner: ssh_protocol_version_exchange: ~~ V~' '->
[2023/07/17 13:54:15.127510, 3] callback_receive_banner: ssh_protocol_version_exchange: ~~~ /
[2023/07/17 13:54:15.127520, 3] callback_receive_banner: ssh_protocol_version_exchange: ~~._. _/
[2023/07/17 13:54:15.127529, 3] callback_receive_banner: ssh_protocol_version_exchange: _/ _/
[2023/07/17 13:54:15.127538, 3] callback_receive_banner: ssh_protocol_version_exchange: _/m/'
[2023/07/17 13:54:40.226261, 1] ssh_connect: Timeout connecting to i-03b79ba2921757fe7
I looked at the code of libssh and found that callback_receive_banner() is assigned to ssh_socket.callbacks->data and called while s->state == SSH_SOCKET_CONNECTED. This loop is finished by receiving the string for protocol version exchange to change the state to SSH_SOCKET_CONNECTED_BANNER_RECEIVED.
I changed the log level of SSM Agent to trace to investigate why the version exchange is not completed and found that it opens a websocket connection to communicate with the client, performs a Handshake of SSM version, and then connects to SSH Server with Port plugin.
$ sudo vi /etc/amazon/ssm/seelog.xml
<seelog type="adaptive" mininterval="2000000" maxinterval="100000000" critmsgcount="500" minlevel="trace">
...
$ sudo service amazon-ssm-agent restart
$ sudo tail -f /var/log/amazon/ssm/amazon-ssm-agent.log
...
2023-07-17 14:37:38 INFO [ssm-session-worker] [xxxxx] [DataBackend] [pluginName=Port] Opening websocket connection to: wss://ssmmessages.ap-northeast-1.amazonaws.com/v1/data-channel/xxxxx?role=publish_subscribe
2023-07-17 14:37:38 INFO [ssm-session-worker] [xxxxx] [DataBackend] [pluginName=Port] Successfully opened websocket connection to: wss://ssmmessages.ap-northeast-1.amazonaws.com/v1/data-channel/xxxxx?role=publish_subscribe
2023-07-17 14:37:38 DEBUG [ssm-session-worker] [xxxxx] [DataBackend] [pluginName=Port] Send agent_session_state message with session status Connected
...
2023-07-17 14:37:39 INFO [ssm-session-worker] [xxxxx] [DataBackend] [pluginName=Port] Handshake successfully completed.
2023-07-17 14:37:39 DEBUG [ssm-session-worker] [xxxxx] [DataBackend] [pluginName=Port] Start separate go routine to read from port connection and write to data channel
2023-07-17 14:37:39 INFO [ssm-session-worker] [xxxxx] [DataBackend] [pluginName=Port] Plugin Port started
I connected to SSH Server with net.Dial() similar to the Port plugin, and then the server version string was sent immediately after the connection. After the version exchange was completed, the list of Key Exchange Method used for key exchange was sent.
package main
import (
"fmt"
"log"
"net"
)
func main() {
conn, err := net.Dial("tcp", "localhost:22")
if err != nil {
log.Fatal(err)
}
defer conn.Close()
packet := make([]byte, 1000)
if _, err = conn.Read(packet); err != nil {
log.Fatal(err)
}
fmt.Println(string(packet))
fmt.Println("Send client version")
conn.Write([]byte("SSH-2.0-OpenSSH_8.7\n"))
if _, err = conn.Read(packet); err != nil {
log.Fatal(err)
}
fmt.Println(string(packet))
}
// ./main
// SSH-2.0-OpenSSH_8.7
//
// Send client version
// �
// �m�ľ�P�(�ci����curve25519-sha256,[email protected],...
So I thought that the session-manager-plugin was doing something, but that’s not the case. Once I did start-session, the version string at the time of connection was returned.
% aws ssm start-session --document-name AWS-StartSSHSession --target "i-03b79ba2921757fe7"
Starting session with SessionId: xxxxxx
SSH-2.0-OpenSSH_8.7
It means AWS-StartSSHSession contains “SSH”, but it only establishes TCP connection and doesn’t perform SSH-related processes. I looked at the ssh-over-ssm script again and found the process of replacing it with ssh command if it is called from other than ssh command. ssh command performs version exchange, etc.
if [[ $(basename -- $(ps -o comm= -p $PPID)) != "ssh" ]]; then
exec ssh -o IdentityFile="${SSH_TMP_KEY}" -o ProxyCommand="$0 $1 $2" "$2@$1"
As a result, removed this exec part and ran it again, and the connection succeeded.