Question

I'm looking over a document that describes various techniques to improve performance of Lua script code, and I'm shocked that such tricks would be required. (Although I'm quoting Lua, I've seen similar hacks in Javascript).

Why would this optimization be required:

For instance, the code

for i = 1, 1000000 do 
   local x = math.sin(i) 
end

runs 30% slower than this one:

local sin = math.sin 
for i = 1, 1000000 do
    local x = sin(i) 
end

They're re-declaring sin function locally.

Why would this be helpful? It's the job of the compiler to do that anyway. Why is the programmer having to do the compiler's job?

I've seen similar things in Javascript; and so obviously there must be a very good reason why the interpreting compiler isn't doing its job. What is it?


I see it repeatedly in the Lua environment I'm fiddling in; people redeclaring variables as local:

local strfind = strfind
local strlen = strlen
local gsub = gsub
local pairs = pairs
local ipairs = ipairs
local type = type
local tinsert = tinsert
local tremove = tremove
local unpack = unpack
local max = max
local min = min
local floor = floor
local ceil = ceil
local loadstring = loadstring
local tostring = tostring
local setmetatable = setmetatable
local getmetatable = getmetatable
local format = format
local sin = math.sin

What is going on here that people have to do the work of the compiler? Is the compiler confused by how to find format? Why is this an issue that a programmer has to deal with? Why would this not have been taken care of in 1993?


I also seem to have hit a logical paradox:

  1. Optimization should not be done without profiling
  2. Lua has no ability to be profiled
  3. Lua should not be optimized
Was it helpful?

Solution

Why would this be helpful? It's the job of the compiler to do that anyway. Why is the programmer having to do the compiler's job?

Lua is a dynamic language. Compilers can do a lot of reasoning in static languages, like pulling constant expressions out of the loop. In dynamic languages, the situation is a bit different.

Lua's main (and also only) data structure is the table. math is also just a table, even though it is used as a namespace here. Nobody can stop you from modifying the math.sin function somewhere in the loop (even thought that would be an unwise thing to do), and the compiler cannot know that when compiling the code. Therefore the compiler does exactly what you instruct it to do: in every iteration of the loop, lookup the sin function in the math table and call it.

Now, if YOU know that you are not going to modify math.sin (i.e. you are going to call the same function), you can save it in a local variable outside the loop. Because there are no table lookups, the resulting code is faster.

The situation is a bit different with LuaJIT - it uses tracing and some advanced magic to see what your code is doing in runtime, so it can actually optimize the loop by moving the expression outside of the loop, and other optimizations, apart from actually compiling it to machine code, making it crazy fast.

Regarding the the 'redeclaring variables as local' - many times when defining a module, you want to work with the original function. When accessing pairs, max or anything using their global variables, nobody can assure you that it will be the same function every call. For example stdlib redefines a lot of global functions.

By creating a local variable with the same name as the global, you essentially store the function into a local variable, and because local variables (which are lexically scoped, meaning they are visible in the current scope and any nested scopes too) take precedence before globals, you make sure to always call the same function. Should someone modify the global later, it will not affect your module. Not to mention it is also faster, because globals are looked up in a global table (_G).

Update: I just read Lua Performance Tips by Roberto Ierusalimschy, one of Lua authors, and it pretty much explains everything that you need to know about Lua, performance and optimization. IMO the most important rules are:

Rule #1: Don’t do it.

Rule #2: Don’t do it yet. (for experts only)

OTHER TIPS

The reason why it's not done by default, I don't know. Why it's faster however is because locals get written to a register, while a global means looking it up in a table (_G), which is known to be somewhat slower.

As for the visibility (like with the format function): A local obscures the global. So if you declare a local function with the same name as a global, the local will be used instead as long as it is in scope. If you would want to use the global function instead, use _G.function.

If you really want fast Lua, you could try LuaJIT

i see it repeatedly in the Lua environment i'm fiddling in; people redeclaring variables as local:

Doing that by default is plain wrong.

It is arguably useful to use local references instead of table accesses when a function is used over and over again, like inside your example loop:

local sin = math.sin 
for i = 1, 1000000 do
  local x = sin(i) 
end

However, outside loops, the overhead of adding a table access is completely negligible.

What is going on here that people have to do the work of the compiler?

Because the two code samples you made above don't mean exactly the same thing.

It's not like the function can change while my function is running.

Lua is a very dynamic language, and you can't make the same assumptions than in other more restrictive languages, like C. The function can change while your loop is running. Given the dynamic nature of the language, the compiler can not assume that the function will not change. Or at least not without a complex analysis of your code and its ramifications.

The trick is that, even if your two pieces of code look equivalent, in Lua they are not. On the first one you are explicitly telling it to "get the sin function inside the math table on every iteration". On the second one you are using a single reference to the same function again and again.

Consider this:

-- The first 500000 will be sines, the rest will be cosines
for i = 1, 1000000 do 
   local x = math.sin(i)
   if i==500000 then math.sin = math.cos end 
end

-- All will be sines, even if math.sin is changed
local sin = math.sin
for i = 1, 1000000 do 
   local x = sin(i)
   if i==500000 then math.sin = math.cos end 
end

Storing functions in local variables removes the table indexing to look up the function key each iteration of the loop, the math ones are obvious, as it needs to lookup the hash in the Math table, the others aren't, they are indexed into the _G (global table), which is now _ENV (environment table) as of 5.2.

Also, one should be able to profile lua using its debug hooks API, or by using the lua debuggers lying around.

My assumption is that in the optimized version, because the reference to the function is stored in a local variable, a tree traversal doesn't have to be done on every iteration of the for loop (for lookup to math.sin).

I'm not sure about the local references set to function names, but I'd assume that there's some sort of a global namespace lookup required if a local one isn't found.

Then again, I could be WAY off base ;)

Edit: I also assume that the Lua compiler is dumb (which is a general assumption for me about compilers anyway ;))

This isn't just a bug/feature of Lua, many languages including Java and C will perform faster if you access local values instead of values out of scope, such as from a class or array.

In C++ for example, it's faster to access a local member than it would be to access variable members of some class.

This would count to 10,000 faster:

for(int i = 0; i < 10000, i++)
{
}

than:

for(myClass.i = 0; myClass.i < 10000; myClass.i++)
{
}

The reason Lua holds global values inside a table is because it allows the programmer to quickly save and change the global environment simply by changing the table that _G references. I agree that it would be nice to have some 'syntatic sugar' that treated the global table _G as a special case; rewriting them all as local variables in the file scope (or something similar), of course there is nothing stopping us from doing this ourselves; maybe a function optGlobalEnv(...) that 'localizes' the _G table and it's members / values to the 'file scope' using unpack() or something.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top