문제

I have data in LISP form and I need to process them in RapidMiner. I am new to LISP and to RapidMiner aswell. RapidMiner doesn't accept the LISP (I guess it's because it is programming language) so I probably need somehow to convert LISP form to CSV or something like that. Little example of code:

(def-instance Adelphi
   (state newyork)
   (control private)
   (no-of-students thous:5-10)
   ...)
(def-instance Arizona-State
   (state arizona)
   (control state)
   (no-of-students thous:20+)
   ...)
(def-instance Boston-College
   (state massachusetts)
   (location suburban)
   (control private:roman-catholic)
   (no-of-students thous:5-10)
   ...)

I would be really grateful for any advice.

도움이 되었습니까?

해결책

You can make use of the fact that Lisp's parser is available to the Lisp user. A problem with this data is that some values contain colons, with is used the package name separator in Common Lisp. I made some working Common Lisp code to solve your question, but I've had to work around the mentioned problem by defining appropriate packages.

Here's the code, that of course has to be extended (following the same patterns that are already used in it) for everything you left out in the example in your question:

(defpackage #:thous
  (:export #:5-10 #:20+))
(defpackage #:private
  (:export #:roman-catholic))

(defstruct (college (:conc-name nil))
  (name "")
  (state "")
  (location "")
  (control "")
  (no-of-students ""))

(defun data->college (name data)
  (let ((college (make-college :name (write-to-string name :case :capitalize))))
    (loop for (key value) in data
       for string = (remove #\| (write-to-string value :case :downcase))
       do (case key
            (state (setf (state college) string))
            (location (setf (location college) string))
            (control (setf (control college) string))
            (no-of-students (setf (no-of-students college) string))))
    college))

(defun read-data (stream)
  (loop for (def-instance name . data) = (read stream nil nil)
     while def-instance
     collect (data->college name data)))

(defun print-college-as-csv (college stream)
  (format stream
          "~a~{,~a~}~%"
          (name college)
          (list (state college)
                (location college)
                (control college)
                (no-of-students college))))

(defun data->csv (in out)
  (let ((header (make-college :name "College"
                              :state "state"
                              :location "location"
                              :control "control"
                              :no-of-students "no-of-students")))
    (print-college-as-csv header out)
    (dolist (college (read-data in))
      (print-college-as-csv college out))))

(defun data-file-to-csv (input-file output-file)
  (with-open-file (in input-file)
   (with-open-file (out output-file
                        :direction :output
                        :if-does-not-exist :create
                        :if-exists :supersede)
     (data->csv in out))))

The main function is data-file-to-csv, that can be called with (data-file-to-csv "path-to-input-file" "path-to-output-file") in a Common Lisp REPL after loading this code.

EDIT: some additional thoughts

It would actually be easier, instead of adding package definitions for all values with colons, to do a regular expression search and replace over the data to add quotes(") around all the values. That will make Lisp parse them as strings right away. In that case, the line for string = (remove #\| (write-to-string value :case :downcase)) could be removed and string be replaced with value in all the lines of the case statement.

Because of the high regularity of the data, it shouldn't even actually be necessary to parse the Lisp definitions correctly at all. Instead, you could just extract the data with regular expressions. A language particularly suited for regex-based transformation of text files should be just fine for that job, like AWK or Perl.

라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top