Ruby: Remove invisible characters after converting string to UTF-8

Question 1

Without seeing your code, it's hard to know exactly what's going on for you. I'll point out, however, that String#force_encoding doesn't transcode the String; it's a way of saying, "No, really, this is UTF-8", for example. To transcode from one encoding to another, use String#encode.

This seems to work for me:

require 'net/http'
s = Net::HTTP.get('www.eximsystems.com', '/LaVerdad/Antiguo/Gn/Genesis.htm')
s.force_encoding('windows-1252')
s.encode!('utf-8')

In general, /[[:space:]]/ should capture more kinds of whitespace that /\s/ (which is equivalent to /[ \t\r\n\f]/), but it doesn't appear to be necessary in this case. I can't find any abnormal whitespace in s at this point. If you're still having problems, you'll need to post your code and a more precise description of the issue.

Update: Thanks for updating your question with your code and an example of the problem. It looks like the issue is non-breaking spaces. I think it's simplest to get rid of them at the source:

require 'nokogiri'
require 'open-uri'

URL = 'http://www.eximsystems.com/LaVerdad/Antiguo/Gn/Genesis.htm'
s = open(URL).read            # Separate these three lines to convert &nbsp;
s.gsub!('&nbsp;', ' ')        #  to normal ' ' in source rather than after
html = Nokogiri.HTML(s)       #  conversion to unicode non-breaking space

# Extract Paragraphs
text = ''
html.css('p').each do |p|
  text += p.text
end

# Clean Up Text
text.gsub!(/\s+/, ' ')

puts text

There's now just a single, normal space between the period at the end of 15 and the number 16:

15) Besó también José a todos sus hermanos, orando sobre cada uno de ellos; después de cuyas demostraciones cobraron aliento para conversar con él. 16 Al punto corrió la voz, y se divulgó generalmente esta noticia en el palacio del rey: Han venido los hermanos de José; y holgóse de ello Faraón y toda su corte.

Question 2

You can try to use text.strip for removing the whitespaces.