How to improve the performance of this piece of code?

Question 1

Two things:

The first time you run the function the time will include the compile time of the code. If you want a apples to apples comparison with a compiled function in Mathematica you should run the function twice and time the second run. With your code I get:

elapsed time: 1.156531976 seconds (447764964 bytes allocated)

for the first run which includes the compile time and

elapsed time: 1.135681299 seconds (447520048 bytes allocated)

for the second run when you don't need to compile.

The second thing, and arguably the bigger thing, is that you should avoid global variables in performance critical code. This is the first tip in the performance tips section of the manual.

Here is the same code using local variables:

function fdtd1d_local(steps, ie = 200)
    ez = zeros(ie + 1);
    hy = zeros(ie);
    for n in 1:steps
        for i in 2:ie
            ez[i]+= (hy[i] - hy[i-1])
        end
        ez[1]= sin(n/10)
        for i in 1:ie
            hy[i]+= (ez[i+1]- ez[i])
        end
    end
    return (ez, hy)
end

fdtd1d_local(10000)
@time fdtd1d_local(10000);

To compare your Mathematica code on my machine gives

{0.094005, Null}

while the result from @time for fdtd1d_local is:

elapsed time: 0.015188926 seconds (4176 bytes allocated)

Or about 6 times faster. Global variables make a big difference.

Question 2

I believe in using limited number of loops and use loops only when required. Expressions can be used in place of loops. It is not possible to avoid all the loops, but the code would be optimized if we reduce some of them. In the above program I did a bit of optimization by using expressions. The time was almost reduced by half.

ORIGINAL CODE :

ie = 200;
ez = zeros(ie + 1);
hy = zeros(ie);

fdtd1d (steps)=
    for n in 1:steps
        for i in 2:ie
            ez[i]+= (hy[i] - hy[i-1])
        end
        ez[1]= sin(n/10)
        for i in 1:ie
            hy[i]+= (ez[i+1]- ez[i])
        end
    end

@time fdtd1d(10000);

The output is

julia> 
elapsed time: 1.845615295 seconds (239687888 bytes allocated)

OPTIMIZED CODE:

ie = 200;
ez = zeros(ie + 1);
hy = zeros(ie);

fdtd1d (steps)=
    for n in 1:steps


        ez[2:ie] = ez[2:ie]+hy[2:ie]-hy[1:ie-1];
        ez[1]= sin(n/10);
        hy[1:ie] = hy[1:ie]+ez[2:end]- ez[1:end-1]

    end

@time fdtd1d(10000);

OUTPUT

julia>
elapsed time: 0.93926323 seconds (206977748 bytes allocated)