ruby convert array to hash preserve duplicate key

Question 1

I hope you would like this :

ary = [
       "19d97e408ee3f993745b053e281ac9dc69519e06","refs/heads/auto",
       "8f6f47c6e8023540b022586e368c68e1e814ce6d","refs/heads/callout_hooks",  
       "3cbdb4b2fcb85bc7f0ed08b62e2bf2445a7659e8","refs/heads/elab",
       "d38a9a26ef887c08b306bdab210b39882f58e587","refs/heads/elab_6.1",
       "19d97e408ee3f993745b053e281ac9dc69519e06","refs/heads/master",
       "906dfe6eebff832baf0f92683d751432fcc98ab7","refs/heads/regression"
     ]

array_hash = ary.each_slice(2).with_object(Hash.new { |h,k| h[k] = []}) do |(k,v),hash|
  hash[k] << v 
end

# the main advantage is here you wouldn't loose any data, all are with you. You can
# use it as per your need. I think it is a better approach to deal with your situation.
array_hash
# => {"19d97e408ee3f993745b053e281ac9dc69519e06"=>
#      ["refs/heads/auto", "refs/heads/master"],
#     "8f6f47c6e8023540b022586e368c68e1e814ce6d"=>["refs/heads/callout_hooks"],
#     "3cbdb4b2fcb85bc7f0ed08b62e2bf2445a7659e8"=>["refs/heads/elab"],
#     "d38a9a26ef887c08b306bdab210b39882f58e587"=>["refs/heads/elab_6.1"],
#     "906dfe6eebff832baf0f92683d751432fcc98ab7"=>["refs/heads/regression"]}

Question 2

If you make a hash of hash_value => array of refs, you'll keep everything:

array = ["19d97e408ee3f993745b053e281ac9dc69519e06","refs/heads/auto",
 "8f6f47c6e8023540b022586e368c68e1e814ce6d","refs/heads/callout_hooks",  
 "3cbdb4b2fcb85bc7f0ed08b62e2bf2445a7659e8","refs/heads/elab",
 "d38a9a26ef887c08b306bdab210b39882f58e587","refs/heads/elab_6.1",
 "19d97e408ee3f993745b053e281ac9dc69519e06","refs/heads/master",
 "906dfe6eebff832baf0f92683d751432fcc98ab7","refs/heads/regression"
]

array.each_slice(2).reduce({}) { |h, (k, v)| (h[k] ||= []) << v; h }

Looks like Arup and I were thinking the same way...

Question 3

You gave two options for what you want to do:

Get a list of the values which were dropped in the conversion
Make the keys unique by adding a special character to the key

I think the second approach is a bad idea, for a couple of reasons: a) you would have to have a method of modifying the key that would allow for the possibility of their being multiple duplicates; and b) making connections between the original and the duplicates would be awkward. Also, it would be just plain ugly.

I see others have suggested a third possibility: changing the form of the resulting hash, so that values arrays of strings. That might serve you well, but it is not what you asked for, so I chose to build a list of the values that are dropped; i.e., all but the first.

Code

def create_hash_and_save_extras(arr)
  arr.each_slice(2).with_object([{},[]]) { |(k,v),(h,ex)|
    h.update({k=>v}) { |k, ov, nv| ex << {k=>nv}; ov } }
end

Example

create_hash_and_save_extras(arr)
  #=> [{"19d97e408ee3f993745b053e281ac9dc69519e06"=>"refs/heads/auto",
  #     "8f6f47c6e8023540b022586e368c68e1e814ce6d"=>"refs/heads/callout_hooks",
  #     "3cbdb4b2fcb85bc7f0ed08b62e2bf2445a7659e8"=>"refs/heads/elab",
  #     "d38a9a26ef887c08b306bdab210b39882f58e587"=>"refs/heads/elab_6.1",
  #     "906dfe6eebff832baf0f92683d751432fcc98ab7"=>"refs/heads/regression"},
  #   [{"19d97e408ee3f993745b053e281ac9dc69519e06"=>"refs/heads/master"}]]

Explanation

Enumerable#each_slice sent to arr returns an enumerator:

enum1 = arr.each_slice(2)
  #=> #<Enumerator: [
  #      "19d97e408ee3f993745b053e281ac9dc69519e06", "refs/heads/auto",
  #      "8f6f47c6e8023540b022586e368c68e1e814ce6d", "refs/heads/callout_hooks",
  #      ...
  #      "906dfe6eebff832baf0f92683d751432fcc98ab7", "refs/heads/regression"
  #   ]:each_slice(2)>

Enumerator#with_object creates an array consisting of and initially-empty hash (represented by the block variable h) and an initially-empty array (for the "extras"), represented by the block variable ex, which is then sent to enum1 to create another enumerator (which you can think of as a "compound enumerator"--note the reference to each_slice(2)>:with_object({}) below).

enum2 = enum1.with_object([{},[]])
  #=> #<Enumerator: #<Enumerator: [
  #      "19d97e408ee3f993745b053e281ac9dc69519e06", "refs/heads/auto",
  #      "8f6f47c6e8023540b022586e368c68e1e814ce6d", "refs/heads/callout_hooks",
  #      ...
  #      "906dfe6eebff832baf0f92683d751432fcc98ab7", "refs/heads/regression"
  #   ]:each_slice(2)>:with_object([{},[])>

We can convert enum2 to an array to see what it will be passing into its block:

enum2.to_a
#=> [[["19d97e408ee3f993745b053e281ac9dc69519e06", "refs/heads/auto"],
#       [{}, []]],
#    [["8f6f47c6e8023540b022586e368c68e1e814ce6d", "refs/heads/callout_hooks"],
#       [{}, []]],
#    [["3cbdb4b2fcb85bc7f0ed08b62e2bf2445a7659e8", "refs/heads/elab"],
#       [{}, []]],
#    [["d38a9a26ef887c08b306bdab210b39882f58e587", "refs/heads/elab_6.1"],
#       [{}, []]],
#    [["19d97e408ee3f993745b053e281ac9dc69519e06", "refs/heads/master"],
#       [{}, []]],
#    [["906dfe6eebff832baf0f92683d751432fcc98ab7", "refs/heads/regression"],
#       [{}, []]],

The first element that enum2 passes into its block is

[["19d97e408ee3f993745b053e281ac9dc69519e06", "refs/heads/auto"], [{}, []]]]]

The block variables are therefore assigned as follows:

k => "19d97e408ee3f993745b053e281ac9dc69519e06"
v => "refs/heads/auto"
h => {}
ex = []

We now use Hash#update (aka Hash#merge!) to merge {k,v} into h (h initially being empty.) Therefore

h.update({k=>v}) { |k, ov, nv| extras << {k=>nv}; ov }

becomes

h.update({"19d97e408ee3f993745b053e281ac9dc69519e06"=>"refs/heads/auto"})

followed by the block

{ |k, ov, nv| ex << {k=>nv}; ov }

but the block only applies when the hash merged hash (h) and the hash being merged (update's argument) share the same key k, in which case ov and nv are the values associated with those keys for h and the hash being merged, respectively. The merged value for key k will be whatever is returned by the block. Yes, that will apply when we encounter duplicates.

So now

h #=> {"19d97e408ee3f993745b053e281ac9dc69519e06"=>"refs/heads/auto"}

We continue in this way for each of the other elements of enum2. When we encounter

k = "19d97e408ee3f993745b053e281ac9dc69519e06"
v = "refs/heads/master"
h = {"19d97e408ee3f993745b053e281ac9dc69519e06"=>"refs/heads/auto",
      "8f6f47c6e8023540b022586e368c68e1e814ce6d"=>"refs/heads/callout_hooks",
      "3cbdb4b2fcb85bc7f0ed08b62e2bf2445a7659e8"=>"refs/heads/elab",
      "d38a9a26ef887c08b306bdab210b39882f58e587"=>"refs/heads/elab_6.1"}

we find that k is already in the merged hash h, so the block is evaluated to determine the value of k in the merged hash h. We want to keep the current value h[k], which is ov, so that is what the block returns. First, however, we append the (still empty) array ex with the duplicate value, expressed as a hash.

ex << {"19d97e408ee3f993745b053e281ac9dc69519e06" => "refs/heads/master"}