In the first one, instead of:
File.open(ARGV[0], "r").each_line do |line|
Use:
File.foreach(ARGV[0]) do |line|
And instead of:
incr += 1
if incr % 3 == 0
Use:
if $. % 3 == 0
$.
is a magic variable for the line number of the last read line.
In the second one, instead of:
line.gsub("(","").gsub(")","").split(",").map{ |s| s.to_i}
Use:
line.tr('()', '').split(',').map(&:to_i)
In the third one, instead of:
line.split("),(").map{ |s| s.gsub("(","").gsub(")","").split(",").map{ |s| s.to_i}}
Use:
line.scan(/(?:\d+,?)+/).map{ |s| s.split(',', 0).map(&:to_i) }
Here's how that line works:
line.scan(/(?:\d+,?)+/)
=> ["1,2,3,", "1,2,3,"]
line.scan(/(?:\d+,?)+/).map{ |s| s.split(',',0) }
=> [["1", "2", "3"], ["1", "2", "3"]]
line.scan(/(?:\d+,?)+/).map{ |s| s.split(',', 0).map(&:to_i) }
=> [[1, 2, 3], [1, 2, 3]]
I didn't run any benchmarks to compare speed, but the changes should be faster too because the gsub
calls are gone. The changes I made weren't necessarily the fastest ways to do things, they're more-optimized versions of your own code.
Trying to compare the speed of Ruby to other languages requires knowledge of the fastest ways of accomplishing each step, based on multiple benchmarks of that step. It also implies you're running on identical hardware and OS and your languages are all compiled to their most efficient-for-speed forms. Languages make tradeoffs of memory use vs. speed, so, while one might be slower than another, it also might be more memory efficient.
Plus, when coding in an production environment, the time to produce code that works correctly has to be factored into the "which is faster" equation. C is extremely fast, but takes longer to write programs than Ruby for most problems, because C doesn't hold your hand like Ruby does. Which is faster when the C code takes a week to write and debug, vs. the Ruby code that took an hour? Just stuff to think about.
I didn't read through @tadman's answer and the comments until I finished. Using:
map(&:to_i)
used to be slower than:
map{ |s| s.to_i }
The speed difference depends on the version of Ruby you're running. Originally using the &:
was implemented in some monkey-patches but now it's built-into Ruby. When they made that change it sped up a lot:
require 'benchmark'
foo = [*('1'..'1000')] * 1000
puts foo.size
N = 10
puts "N=#{N}"
puts RUBY_VERSION
puts
Benchmark.bm(6) do |x|
x.report('&:to_i') { N.times { foo.map(&:to_i) }}
x.report('to_i') { N.times { foo.map{ |s| s.to_i } }}
end
Which outputs:
1000000
N=10
2.0.0
user system total real
&:to_i 1.240000 0.000000 1.240000 ( 1.250948)
to_i 1.400000 0.000000 1.400000 ( 1.410763)
That's going through 10,000,000 elements, which only resulted in a .2/sec difference. It's not much of a difference between the two ways of doing the same thing. If you're going to be processing a lot more data then it matters. For most applications it's a moot point because other things will be the bottlenecks/slow-downs, so write the code whichever way works for you, with that speed difference in mind.
To show the difference the Ruby version makes, here's the same benchmark results using Ruby 1.8.7:
1000000 N=10 1.8.7 user system total real &:to_i 4.940000 0.000000 4.940000 ( 4.945604) to_i 2.390000 0.000000 2.390000 ( 2.396693)
As far as gsub
vs. tr
:
require 'benchmark'
foo = '()' * 500000
puts foo.size
N = 10
puts "N=#{N}"
puts RUBY_VERSION
puts
Benchmark.bm(6) do |x|
x.report('tr') { N.times { foo.tr('()', '') }}
x.report('gsub') { N.times { foo.gsub(/[()]/, '') }}
end
With these results:
1000000 N=10 1.8.7 user system total real tr 0.010000 0.000000 0.010000 ( 0.011652) gsub 3.010000 0.000000 3.010000 ( 3.014059)
and:
1000000 N=10 2.0.0 user system total real tr 0.020000 0.000000 0.020000 ( 0.017230) gsub 1.900000 0.000000 1.900000 ( 1.904083)
Here's the sort of difference we can see from changing the regex pattern, which forces changes in the processing needed to get the desired result:
require 'benchmark'
line = '((1,2,3),(1,2,3))'
pattern1 = /\([\d,]+\)/
pattern2 = /\(([\d,]+)\)/
pattern3 = /\((?:\d+,?)+\)/
pattern4 = /\d(?:[\d,])+/
line.scan(pattern1) # => ["(1,2,3)", "(1,2,3)"]
line.scan(pattern2) # => [["1,2,3"], ["1,2,3"]]
line.scan(pattern3) # => ["(1,2,3)", "(1,2,3)"]
line.scan(pattern4) # => ["1,2,3", "1,2,3"]
line.scan(pattern1).map{ |s| s[1..-1].split(',').map(&:to_i) } # => [[1, 2, 3], [1, 2, 3]]
line.scan(pattern2).map{ |s| s[0].split(',').map(&:to_i) } # => [[1, 2, 3], [1, 2, 3]]
line.scan(pattern3).map{ |s| s[1..-1].split(',').map(&:to_i) } # => [[1, 2, 3], [1, 2, 3]]
line.scan(pattern4).map{ |s| s.split(',').map(&:to_i) } # => [[1, 2, 3], [1, 2, 3]]
N = 1000000
Benchmark.bm(8) do |x|
x.report('pattern1') { N.times { line.scan(pattern1).map{ |s| s[1..-1].split(',').map(&:to_i) } }}
x.report('pattern2') { N.times { line.scan(pattern2).map{ |s| s[0].split(',').map(&:to_i) } }}
x.report('pattern3') { N.times { line.scan(pattern3).map{ |s| s[1..-1].split(',').map(&:to_i) } }}
x.report('pattern4') { N.times { line.scan(pattern4).map{ |s| s.split(',').map(&:to_i) } }}
end
On Ruby 2.0-p427:
user system total real
pattern1 5.610000 0.010000 5.620000 ( 5.606556)
pattern2 5.460000 0.000000 5.460000 ( 5.467228)
pattern3 5.730000 0.000000 5.730000 ( 5.731310)
pattern4 5.080000 0.010000 5.090000 ( 5.085965)