Read this article which describes exactly your problem: http://www.spacevatican.org/2012/7/7/stripping-invalid-utf-8/
A code of a solution from this article:
html = html.force_encoding('UTF-8').
encode('UTF-16', :invalid => :replace, :replace => '').
encode('UTF-8')