Domanda

(def evil-code (str "(" (slurp "/mnt/src/git/clj/clojure/src/clj/clojure/core.clj") ")" ))
(def r (read-string evil-code ))

Works, but unsafe

(def r (clojure.edn/read-string evil-code))
RuntimeException Map literal must contain an even number of forms  clojure.lang.Util.runtimeException (Util.java:219)

Does not work...

How to read Clojure code (presering all '#'s as themselves is desirable) into a tree safely? Imagine a Clojure antivirus that want to scan the code for threats and wants to work with data structure, not with plain text.

È stato utile?

Soluzione

First of all you should never read clojure code directly from untrusted data sources. You should use EDN or another serialization format instead.

That being said since Clojure 1.5 there is a kind of safe way to read strings without evaling them. You should bind the read-eval var to false before using read-string. In Clojure 1.4 and earlier this potentially resulted in side effects caused by java constructors being invoked. Those problems have since been fixed.

Here is some example code:

(defn read-string-safely [s]
  (binding [*read-eval* false]
    (read-string s)))

(read-string-safely "#=(eval (def x 3))")
=> RuntimeException EvalReader not allowed when *read-eval* is false.  clojure.lang.Util.runtimeException (Util.java:219)

(read-string-safely "(def x 3)")
=> (def x 3)

(read-string-safely "#java.io.FileWriter[\"precious-file.txt\"]")
=> RuntimeException Record construction syntax can only be used when *read-eval* == true  clojure.lang.Util.runtimeException (Util.java:219)

Regarding reader macro's

The dispatch macro (#) and tagged literals are invoked at read time. There is no representation for them in Clojure data since by that time these constructs all have been processed. As far as I know there is no build in way to generate a syntax tree of Clojure code.

You will have to use an external parser to retain that information. Either you roll your own custom parser or you can use a parser generator like Instaparse and ANTLR. A complete Clojure grammar for either of those libraries might be hard to find but you could extend one of the EDN grammars to include the additional Clojure forms. A quick google revealed an ANTLR grammar for Clojure syntax, you could alter it to support the constructs that are missing if needed.

There is also Sjacket a library made for Clojure tools that need to retain information about the source code itself. It seems like a good fit for what you are trying to do but I don't have any experience with it personally. Judging from the tests it does have support for reader macro's in its parser.

Altri suggerimenti

According to the current documentation you should never use read nor read-string to read from untrusted data sources.

WARNING: You SHOULD NOT use clojure.core/read or
clojure.core/read-string to read data from untrusted sources.  They
were designed only for reading Clojure code and data from trusted
sources (e.g. files that you know you wrote yourself, and no one
else has permission to modify them).

You should use read-edn or clojure.edn/read which were designed with that purpose in mind.

There was a long discussion in the mailing list regarding the use of read and read-eval and best practices regarding those.

I wanted to point out an old library (used in LightTable) that uses read-stringwith a techniques to propose a client/server communication

Fetch : A ClojureScript library for Client/Server interaction.

You can see in particular the safe-read method :

(defn safe-read [s]
  (binding [*read-eval* false]
    (read-string s)))

You can see the use of binding *read-eval* to false. I think the rest of the code is worth watching at for the kind of abstractions it proposes.

In a PR, it is suggested that there is a security problem that can be fixed by using edn instead (...aaand back to your question) :

(require '[clojure.edn :as edn])

(defn safe-read [s]
    (edn/read-string s))
Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top