Titleize with roman numerals, dashes, apostrophes, etc. in Ruby on Rails

https://stackoverflow.com/questions/23556059

18-07-2023
|

Question

I'm simply trying to convert uppercased company names into proper names.

Company names can include:

Dashes
Apostrophes
Roman Numerals
Text like LLC, LP, INC which should stay uppercase.

I thought I might be able to use acronyms like this:

ACRONYMS = %W( LP III IV VI VII VIII IX GI)
ActiveSupport::Inflector.inflections(:en) do |inflect|
  ACRONYMS.each { |a| inflect.acronym(a) }
end

However, the conversion does not take into account word breaks, so having VI and VII does not work. For example, the conversion of "ADVISORS".titleize is "Ad VI Sors", as the VI becomes a whole word.

Dashes get removed.

It seems like there should be a generic gem for this generic problem, but I didn't find one. Is this problem really not that common? What's the best solution besides completely hacking the current inflection library?

Solution

Company names are a little odd, since a lot of times they're Marks (as in Service Mark) more than proper names. That means precise capitalization might actually matter, and trying to titleize might not be worth it.

In any case, here's a pattern that might work. Build your list of tokens to "keep", then manually split the string up and titleize the non-token parts.

# Make sure you put long strings before short (VII before VI)
word_tokens = %w{VII VI IX XI}
# Special characters need to be separate, since they never appear as "part" of another word
special_tokens = %w{-}
# Builds a regex like /(\bVII\b|\bVI\b|-|)/ that wraps "word tokens" in a word boundary check
token_regex = /(#{word_tokens.map{|t| /\b#{t}\b/}.join("|")}|#{special_tokens.join("|")})/
title = "ADVISORS-XI"
title.split(token_regex).map{|s| s =~ token_regex ? s : s.titleize}.join

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow