For the tags array, try using a 1-d vector of Bits that's (n_tag_sz*n_ways) in width. On a cache access, you read out the entire row anyways, and you want to store that in something as dense as possible. So like this:
val tag_array = Mem(Bits(width = tagbits*n_ways), n_sets, seqRead = true)
And here's a snippet of psuedo-code for an i-cache's memory banks, which covers 3 cycles (s0,s1,s2) for ifgen, ic_access, and ic_response:
val s1_tag_match = Vec.fill(n_ways){Bool()}
val s2_tag_hit = Vec.fill(n_ways){Bool()}
val s2_dout = Vec.fill(n_ways){Reg(Bits())}
for (i <- 0 until n_ways)
{
// notice each cycle of refill gets its own line
val data_array = Mem(Bits(width = n_code_width), n_sets*REFILL_CYCLES, seqRead = true)
val s1_raddr = Reg(UInt())
// refill
when (io.mem.resp.valid && repl_way === UInt(i))
{
data_array(Cat(s2_idx,rf_cnt)) := io.mem.resp.bits.data
}
// read enable
.elsewhen (s0_valid)
{
s1_raddr := s0_raddr
}
// read
when (s1_valid && s1_tag_match(i) && ready)
{
s2_dout(i) := data_array(s1_raddr)
}
}
io.resp.bits.data := Mux1H(s2_tag_hit, s2_dout)