Pergunta

I'm trying to decode the following string:

body = '{type:paragaph|class:red|content:[class:intro|body:This is the introduction paragraph.][body:This is the second paragraph.]}'
body << '{type:image|class:grid|content:[id:1|title:image1][id:2|title:image2][id:3|title:image3]}'

I need the string to split at the pipes but not where a pipe is contained with square brackets, to do this I think I need to perform a lookahead as described here: How to split string by ',' unless ',' is within brackets using Regex?

My attempt(still splits at every pipe):

x = self.body.scan(/\{(.*?)\}/).map {|m| m[0].split(/ *\|(?!\]) */)}
->
[
  ["type:paragaph", "class:red", "content:[class:intro", "body:This is the introduction paragraph.][body:This is the second paragraph.]"]
  ["type:image", "class:grid", "content:[id:1", "title:image1][id:2", "title:image2][id:3", "title:image3]"]
]

Expecting:

   ->
    [
      ["type:paragaph", "class:red", "content:[class:intro|body:This is the introduction paragraph.][body:This is the second paragraph.]"]
      ["type:image", "class:grid", "content:[id:1|title:image1][id:2|title:image2][id:3|title:image3]"]
    ]

Does anyone know the regex required here?

Is it possible to match this regex? I can't seem to modify it correctly Regular Expression to match underscores not surrounded by brackets?


I modified the answer here Split string in Ruby, ignoring contents of parentheses? to get:

 self.body.scan(/\{(.*?)\}/).map {|m| m[0].split(/\|\s*(?=[^\[\]]*(?:\[|$))/)}

Seems to do the trick. Though I'm sure if there's any shortfalls.

Foi útil?

Solução 2

I modified the answer here Split string in Ruby, ignoring contents of parentheses? to get:

 self.body.scan(/\{(.*?)\}/).map {|m| m[0].split(/\|\s*(?=[^\[\]]*(?:\[|$))/)}

Seems to do the trick. If it has any shortfalls please suggest something better.

Outras dicas

Dealing with nested structures that have identical syntax is going to make things difficult for you.

You could try a recursive descent parser (a quick Google turned up https://github.com/Ragmaanir/grammy - not sure if any good)

Personally, I'd go for something really hacky - some gsubs that convert your string into JSON, then parse with a JSON parser :-). That's not particularly easy either, though, but here goes:

require 'json'

b1 = body.gsub(/([^\[\|\]\:\}\{]+)/,'"\1"').gsub(':[',':[{').gsub('][','},{').gsub(']','}]').gsub('}{','},{').gsub('|',',')


JSON.parse('[' + b1 + ']')  

It wasn't easy because the string format apparently uses [foo:bar][baz:bam] to represent an array of hashes. If you have a chance to modify the serialised format to make it easier, I would take it.

Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top