質問

Problem: I need to extract certain keys and count them in a hash, as a sample consider:

data = [{"name"=>"name1", "priority"=>"1", "owner"=>"test3"}, 
        {"name"=>"name1", "priority"=>"1", "owner"=>"test4"},
        {"name"=>"name2", "priority"=>"1", "owner"=>"test5"},
        {"name"=>"name2", "priority"=>"2", "owner"=>"test5"},
        {"name"=>"nae954me2", "priority"=>"2", "owner"=>"test5"}]

I want to count the number of records per each [id (extracted from name) and priority] so that at the end I will have something like:

#{{"priority"=>"1", "id"=>"name1"}=>2, {"priority"=>"1", "id"=>"name2"}=>1, {"priority"=>"2", "id"=>"name2"}=>1}

I'm doing the following but I have a feeling that I'm overcomplicating it:

#!/usr/bin/env ruby

data = [{"name"=>"name1", "priority"=>"1", "owner"=>"test3"}, 
       {"name"=>"name1", "priority"=>"1", "owner"=>"test4"},
       {"name"=>"name2", "priority"=>"1", "owner"=>"test5"},
       {"name"=>"name2", "priority"=>"2", "owner"=>"test5"},
       {"name"=>"nae954me2", "priority"=>"2", "owner"=>"test5"}]

# (1) trash some keys, just because I don't need them  
data.each do |d|
  d.delete 'owner'
  # in the real data I have about 4 or 5 that I'm trashing
  d['id'] = d['name'].scan(/[a-z][a-z][a-z][a-z][0-9]/)[0] # only valid ids
  d.delete 'name'
end

puts data
#output: 
#{"priority"=>"1", "id"=>"name1"}
#{"priority"=>"1", "id"=>"name1"}
#{"priority"=>"1", "id"=>"name2"}
#{"priority"=>"2", "id"=>"name2"}
#{"priority"=>"2", "id"=>nil}

# (2) reject invalid keys
data = data.reject { |d| d['id'].nil? }

puts data
#output: 
#{"priority"=>"1", "id"=>"name1"}
#{"priority"=>"1", "id"=>"name1"}
#{"priority"=>"1", "id"=>"name2"}
#{"priority"=>"2", "id"=>"name2"}

# (3) count
counts = Hash.new(0)
data.each do |d|
  counts[d] += 1
end

puts counts
#{{"priority"=>"1", "id"=>"name1"}=>2, {"priority"=>"1", "id"=>"name2"}=>1, {"priority"=>"2", "id"=>"name2"}=>1}

any suggestions on improving my method of counting?

役に立ちましたか?

解決

There are many ways to do this. (You may have noticed that I've done a lot of editing of my answer, explaining in some detail how a method works, only to realize there's a better way to do it, so out comes the machete.) Here are two solutions. The first was inspired by the approach you took, but I've tried to package it to be more Ruby-like. I'm not sure what constitutes a valid "name", so I've put that determination in a separate method that can be easily changed.

Code

def name_valid?(name)
  name[0..3] == "name"
end

data.each_with_object(Hash.new(0)) {|h,g|
  (g[{"id"=>h["name"],"priority"=>h["priority"]}]+=1) if name_valid?(h["name"])}
  #=> {{"id"=>"name1", "priority"=>"1"}=>2,
  #    {"id"=>"name2", "priority"=>"1"}=>1,
  #    {"id"=>"name2", "priority"=>"2"}=>1}

Explanation

Enumerable#each_with_object creates an initially-empty hash with default value zero that is represented by the block variable g. g is built by adding hash elements created from the the elements of data:

g[{"id"=>h["name"],"priority"=>h["priority"]}]+=1

If the hash g has the key

{"id"=>h["name"],"priority"=>h["priority"]}

the value associated with the key is incremented by one. If h does not have this key,

g[{"id"=>h["name"],"priority"=>h["priority"]}]

is set equal to zero before

g[{"id"=>h["name"],"priority"=>h["priority"]}]+=1

is invoked, so the value becomes 1.

Alternative Method

Code

data.each_with_object({}) do |h,g|
  hash = { { "id"=>h["name"], "priority"=>h["priority"] } => 1 } 
  g.update(hash) { |k, vg, _| vg + 1 } if name_valid?(h["name"])
end
  #=> {{"id"=>"name1", "priority"=>"1"}=>2,
  #    {"id"=>"name2", "priority"=>"1"}=>1,
  #    {"id"=>"name2", "priority"=>"2"}=>1}

Explanation

Here, I've used Hash#update (aka Hash#merge!) to merge each element of data (a hash) into the initially-empty hash h (provided the value of "name" is valid). update's block

{ |k, vg, _| vg + 1 }

is invoked if and only if the merged hash (g) and the merging hash (hash) have the same key, k, in which case the block returns the value of the key. Note the third block variable is the value for the key k for the hash hash. As we do not use that value, I've replaced it with the placeholder _.

他のヒント

Depending on what you mean by "something like" this might do the trick:

data.group_by { |h| [h["name"], h["priority"]] }.map { |k, v| { k => v.size } }

=> [{["name1", "1"]=>2}, {["name2", "1"]=>1}, {["name2", "2"]=>1}, {["nae954me2", "2"]=>1}] 
ライセンス: CC-BY-SA帰属
所属していません StackOverflow
scroll top