Pergunta

I have a question that's been bothering me for some time. Is the Common Lisp format function reversible (at least to some degree) in that the format string could be used to retrieve original arguments from format's output? I am aware that the mapping is not one-to-one (one example is the ~@:(~a~) which turns input to uppercase and is not reversible), so necessarily some information is lost. What I have in mind exactly is rather an alternative to regular expressions for string parsing. For example, I would like to be able to write:

(destructure-format t "[~{~a~^, ~}]" "[0, 1, 2]")

and get the response:

=> (0 1 2)

Are you aware of any such attempts or papers discussing a similar approach?

Foi útil?

Solução

Nothing in the standard

There's nothing like this in the standard. Format expressions don't carry enough information to make this useful in any real sense. For just about everything that doesn't bind *print-readably*, there are ways in which the output would be hard to read back. In the case that you gave, with a list formatting,

(destructure-format t "[~{~a~^, ~}]" "[0, 1, 2]")

any solution would have to examine the format directive. What could it then unambiguously observe? The first character in the string must be a #\[, and the last must be #\], and that some occurrences of ", " within the string separate output generated by ~a. What ambiguities could arise, then? Anything that would cause a ", " to be written in the output. E.g.,

CL-USER> (format t "[~{~a~^, ~}]" '(|, | 2 3))
[, , 2, 3]
NIL
CL-USER> (format t "[~{~a~^, ~}]" '(|, | | ,|))
[, ,  ,]
NIL
CL-USER> (format t "[~{~a~^, ~}]" '(|, | | ,| |,|))
[, ,  ,, ,]
NIL
CL-USER> (format t "[~{~a~^, ~}]" '(|, | | ,| #\,))
[, ,  ,, ,]
NIL

Third party libraries

Although library recommendations are off-topic on Stack Overflow, this question didn't start as one, but after seeing Rörd's answer that suggested using a foreign function call to C's scanf, I quickly searched for scanf on the CLiki and found format-setf (and rereading the comments, I see that Xach found it first), the description of which reads:

The Common Lisp equivalent of scanf().

A (relatively) frequently asked question on comp.lang.lisp is "What's the equivalent to scanf()?". The usual answer is "There isn't one, because it's too hard to work out what should happen". Which is fair enough.

However, one year Christophe was bored during exams, so he wrote format-setf.lisp, which may do what you want.

It should be pointed out that currently the behaviour of this program is unspecified, in more senses than just the clobbering of symbols in the "CL" package. What would be nice would be to see a specification appear for its behaviour, so that I don't have an excuse when people say that it's buggy.

Other alternatives

Since you'd really end up asking "What are the possible ways that this could match, you'd essentially be asking for a regular expression plus the extra things that format makes possible.

What I have in mind exactly is rather an alternative to regular expressions for string parsing.

If you're looking for regular expressions, then regular expressions are a great fit. If you're looking for parsing that's not regular expressions, then you probably want to write a genuine parser. It can be daunting the first time, but after that, it gets much easier, and Common Lisp makes it relatively painless. There are even parser generation libraries available. If, on the other hand, you're looking for serialization and de-serialization, the Common Lisp reader and writer make s-expressions a nice and easy choice.

Outras dicas

If you want format string based parsing, but don't need the advanced features of format, you could use C's scanf function through a FFI. Here's an example of doing this with CFFI:

(with-foreign-strings ((input "[0, 1, 2]") (format "[%d, %d, %d]"))
  (with-foreign-objects ((a :int) (b :int) (c :int))
    (foreign-funcall "sscanf" :pointer input :pointer format
                     :pointer a :pointer b :pointer c)
    (loop for x in (list a b c) collect (mem-ref x :int))))
Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top