Domanda

I am currently writing a function with uses the UNIX ls -m command to list a bunch of files, and then transform them into a list using a regex.

My function is as follows:

    def genFileList(path : String = "~") : Iterator[String] = {

  val fileSeparatorRegex: Regex = "(.*),".r

  val fullCommand : String = s"ls -m $path"

  val rawFileList: String = fullCommand.!!

  val files: Iterator[String] = fileSeparatorRegex.findAllIn(rawFileList).matchData.map(_.group(1))

  var debug : List[String] = files.toList

  debug

  files

}

For example: let's assume I have a folder called test with 3 files: test.txt test1.txt test2.txt. The resulting list is:

res1

Very strange...

Lets change the function to:

    def genFileList(path : String = "~") : Iterator[String] = {

  val fileSeparatorRegex: Regex = "(.*)\\n".r \\ Changed to match newline

  val fullCommand : String = s"ls -1 $path" \\ Changed to give file name separated via newline 

  val rawFileList: String = fullCommand.!!

  val files: Iterator[String] = fileSeparatorRegex.findAllIn(rawFileList).matchData.map(_.group(1))

  var debug : List[String] = files.toList

  debug

  files

}

Tadaaaa:

enter image description here

Can anybody help me make sense of the first case failing? Why do the commas generated by ls -m not get matched?

È stato utile?

Soluzione

(.*) is a greedy pattern, it tries to match as much as it can, including the commas

test1.txt, test2.txt, test3.txt
^------------------^^
  all of this is    |
  matched by .*     this is matched by ,

The last chunk is not matched, because it's not followed by a comma.

You can use non-greedy matching using .*?

Alternatively, you can to just do rawFileList.stripSuffix("\n").split(", ").toList

Also, "ls -m ~".!! doesn't work, splitting output on commas won't work if filenames contain commas, "s"ls -m $path".!! is asking for shell injection, and new File(path).list() is way better in all aspects.

Altri suggerimenti

I can see two problems with your initial approach. The first is that the * in your regex is greedy, which means it's sucking up as much as possible before reaching a comma, including other commas. If you change it to non-greedy by adding a ? (i.e. "(.*?),".r) it will only match up to the first comma.

The second problem is that there's no comma following the last file (naturally), so it won't be found by the regex. In your second approach you're getting all three files because there's a newline after each of them. If you want to stick with commas you'd be better off using split (e.g. rawFileList.split(",")).

You might also consider using the list or listFiles methods on java.io.File:

scala> val dir = new java.io.File(".")
f: java.io.File = .

scala> dir.list
res0: Array[String] = Array(test, test1.txt, test2.txt)
Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top