Question

Currently, I am using this function:

function tokenize( str )
  local ret = {}
  string.gsub( str, "([-%w%p()%[%]®+]+)", function( s ) table.insert( ret, s ) end )
  return ret
end

Now, the string can have any character in it(as is clear from function above). I want to break the string to words detecting only the white-spaces and no other character. I have seen the solution mentioned here but it is not helping me even on codepad.org (link). I am working in PtokaX, in case you are wondering. I have tried using

print( split( 'foo/bar/baz/test','/' ) )

too, but that doesn't work either. :(

Is there any other easier way to create the table?

Was it helpful?

Solution

Why don't you just match for non space characters, instead of matching all others?

function tokenize( str )
  local ret = {}
  string.gsub( str, "(%S+)", function( s ) table.insert( ret, s ) end )
  return ret
end

If you want to use other characters for splitting, the pattern set negation is also useful:

s='foo#bar!baz*'
s:gsub('([^#!%*]+)',function(s) print(s) end)

See also: Patterns in the Lua Manual. Also keep in mind Lua patterns are not the same as regexes, they are lighter, but have their limitations.

OTHER TIPS

If you will be working with more advanced structures, I recommend LPeg.

require"lpeg"
lpeg.locale(lpeg)

local pattern = lpeg.P(
    lpeg.Ct(
        (lpeg.space^0*lpeg.C(-lpeg.space)^1)^0
    )
)

local ret = lpeg.match(pattern, str)

for k,v in ipairs(ret) do
    print(k, v)
end
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top