What is an efficient way to access large-ish datasets using JDBC with Clojure

https://stackoverflow.com/questions/13438321

30-11-2021
|

Domanda

EDIT

N00b problem as it turns out. I didn't realize that running lein ring server results in your app being ran in interpreted mode, which is why it was much slower.

Can the following Clojure/ JDBC fragment be optimized so that it runs (much) faster?

(defn test-sql []
  (sql/with-connection (db-connection)
    (sql/with-query-results results ["select * from users order by username asc"]
        (doseq [row results ] 
          (println "User" (row :first_name) (row :last_name)) results))))

I am considering using Clojure for an ETL project. First test I wrote was to print out data from a table I have with ~280K records in it. The implementations I came up with so far have been quite slow; what takes ~12 seconds in Java (even using myBatis to populate objects rather than 'raw' access) takes ~9.5 minutes with my Clojure solution.

I tried map instead of doseq, and tried using a cursor like outlined here: http://asymmetrical-view.com/2010/10/14/clojure-and-large-result-sets.html, but I get about the same execution time for each.

FWIW, same result when doing .println java.lang.System/out (not surprising), and using with-query-results*:

(defn test-sql2 []
  (sql/with-connection (db-connection)
    (sql/with-query-results* ["select * from users order by username asc"]
      (fn [row] (println "User" (row :first_name) (row :last_name))))))

same, same.

Soluzione

Well this is embarrassing... As it turns out, the relatively poor performance was due to running my test code as part of a simple web app (I know, that doesn't make sense in the context of this question), which I ran with lein ring server which I guess is the same as running it using repl (I just didn't make that connection). When I tried compiling and packaging with lein uberjar and then executed that jar with java -jar, it gave me comparable performance to the Java app.

Autorizzato sotto: CC-BY-SA insieme a attribuzione

Non affiliato a StackOverflow