Why using double backslashes in sed for running ssh?

https://stackoverflow.com/questions/12126059

28-06-2021
|

Question

I ran across the following in Gentoo Linux's wiki about dynamic jumphost list:

    ProxyCommand ssh $(echo %h | sed 's/+[^+]*$//;s/\([^+%%]*\)%%\([^+]*\)$/\2 -l \1/;s/:/ -p /') nc -w1 $(echo %h | sed 's/^.*+//;/:/!s/$/ %p/;s/:/ /')

It works, but I would like to understand the sed expression completely.

Reading its original reference, I was able to get a good understanding of the recursive invocation of the command, using the Host *+* pattern. But I have two questions:

The expression uses %%. To see why, I used ssh -v, and observed that when the ssh client parses the $HOME/.ssh/config, it seemed that the first % is stripped. Attempting to confirm the above, I downloaded the openssh source codes, but the readconf.c didn't give me a clue. I am new to OpenSSH source codes, but am not afraid to compile it with debug info, and gdb it. Nevertheless, if there is a quicker way to confirm my conjecture, I would appreciate a hint.

The ssh -v also revealed that:

[...]
debug1: Executing proxy command: exec ssh $(echo zackp%node0+zackp%node1+node3 | sed 's/+[^+]*$//;s/\\([^+%]*\\)%\\([^+]*\\)$/\\2 -l \\1/;s/:/ -p /')
[....]

i.e. the \( is now escaped with a \ in the subshell. Why this is necessary?

Thanks,

--Zack

Solution

Good question. It's a pretty tortuous command! It sounds like you've pretty much got it though. On your machine, the host string has one of the plus-separated hops stripped off; for convenience, that token then has any port and user extracted and turned into options (-l and -p). Finally, the information about the other hops is popped into a string to pass to netcat. ssh on your machine makes the one connection, and executes netcat on its target machine with the string containing the information about the remaining hops. The same process then happens again there, and so on, until all the hops are done, with a netcat instance running on each relay to forward the traffic. Pretty tidy bit of command-line fun!

Your specific questions:

Why are the % signs escaped? This is specific to the ProxyCommand option! From the man page regarding ProxyCommand:

In the command string, any occurrence of ‘%h’ will be substituted by the host name to connect, ‘%p’ by the port, and ‘%r’ by the remote user name.

Like all well-behaved unix utilities, when there's a metacharacter going on, the natural thing is to use that character doubled to represent a literal. Otherwise, there's no way to represent certain strings! It was probably just added by the programmer out of neatness, without thinking that someone would write his own mini-syntax for jump lists using % and post it on the Gentoo wiki!

The % codes are specific to this option, so the escaping is probably buried somewhere near where the option is handled in the OpenSSH source.
Fiddly question! The string specified as the ProxyCommand option isn't a command string that will be passed to ssh directly; it's specifically executed "using the user's shell". So, what goes in the option is meant to be user-friendly so you can type into your ssh.conf what you'd type in your shell.

Now, most people (including me!) aren't too fussed about 100% precise logging, but the OpenBSD guys have a strnvis function that OpenSSH passes over all log strings before outputting them. It encodes control characters and other nasties so that the log output gives a readable record of the precise (null-free) buffer passed in by the string logging functions. This is great, but the only trick is that when reading the logs, you have to 'strunvis' it back to its original form.

Basically, the backslash is an oddity of their logging format. It isn't passed to the shell.

_{Now, I'm guessing here (I don't think it's worth delving too deeply!), but basically the question's about the output of the logging ssh spits out when it's being verbose. I've written logging for process launching before, and it's a bit of a sloppy art, given how complicated arguments can be (embedded newlines? trailing whitespace? crazy quotes?). You don't often need a 100% "accurate" way of logging losslessly the arguments to exec, because it's too tedious. It looks like the author of the OpenSSH code here, when hunting for a single string to log, just spat out the escaped form of the string he had handy for passing as the last argument to sh. It's not a 'perfect' representation of what's going to be exec'ed (because I suspect some whitespace gets lost in logging) and it's perhaps not the most user-friendly thing to log (because it's got more escaping that you typed in!), but it's fine.}

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow