Pregunta

How can I validate text column that it doesn't contain websites, examples can be :

www.google.com
google.com
http://gooogle.com
http://www.google.com
https://www.google.com
https://google.com

I want to do this on the front side but on the back end as well. I'm more interested in back end at the moment, as I will deal with the front end later

Question update:

Based on example provided by MrYoshiji, I've come up with case that is not covered:

http://rubular.com/r/VGgWyfIt7R

See the http://www.google.com in the middle of the text? and it is not matched? That is exactly what I need it to be matched. So I can throw validation error saying you can't put websites.

¿Fue útil?

Solución

I found a strong regexp, credits goes to @PhillPafford (PHP RegEx for "Website Name" If you upvote my answer, please upvote his first!):

/^(([\w]+:)?\/\/)?(([\d\w]|%[a-fA-f\d]{2,2})+(:([\d\w]|%[a-fA-f\d]{2,2})+)?@)?([\d\w][-\d\w]{0,253}[\d\w]\.)+[\w]{2,4}(:[\d]+)?(\/([-+_~.\d\w]|%[a-fA-f\d]{2,2})*)*(\?(&?([-+_~.\d\w]|%[a-fA-f\d]{2,2})=?)*)?(#([-+_~.\d\w]|%[a-fA-f\d]{2,2})*)?$/

To see it in action:

http://rubular.com/r/GOHHrucCdX


UPDATE:

This one will find the names anywhere in the text:

/(([\w]+:)?\/\/)?(([\d\w]|%[a-fA-f\d]{2,2})+(:([\d\w]|%[a-fA-f\d]{2,2})+)?@)?([\d\w][-\d\w]{0,253}[\d\w]\.)+[\w]{2,4}(:[\d]+)?(\/([-+_~.\d\w]|%[a-fA-f\d]{2,2})*)*(\?(&?([-+_~.\d\w]|%[a-fA-f\d]{2,2})=?)*)?(#([-+_~.\d\w]|%[a-fA-f\d]{2,2})*)?/

Note that I removed the ^ at the start and the $ at the end to make it work within a text:

Rubular source:

^ Start of line

$ End of line

http://rubular.com/r/iEVzfv2U3O


@GandalfStormCrow noticed that the following is matched:

Since I was little.My first dog
                #^^^

The only way I see to solve this issue would be to replace little.My with little. My:

text.gsub(/\w\.[A-Z]/) { |matched_string| matched_string.gsub('.', '. ') }

See it in action:

1.9.3p489 :018 > text = "hello my name is robert.My dog"
 => "hello my name is robert.My dog" 
1.9.3p489 :019 > text.gsub(/\w\.[A-Z]/) { |matched_string| matched_string.gsub('.', '. ') }
 => "hello my name is robert. My dog" 

Otros consejos

in your model add:

validates_format_of :your_column, without: /\A((http(s)?:)?\/\/(www)?.)?(www.)?[a-zA-Z]*.com\z/

Here's where I crafted the Regex

I see many have taken the regex approach to this but maybe you'd like to create a simple blacklist. Something like:

class MyModel < ActiveRecord::Base
  BLACKLIST = [
    'google.com'
  ]

  validate :disallow_blacklisted_urls

  private
  def disallow_blacklisted_urls
    BLACKLIST.each do |blacklisted_url|
      if my_field && my_field.include?(blacklisted_url)
        errors.add(:my_field, "must not contain #{blacklisted_url}")
      end
    end
  end
end

The reason I'd go this way is that you can easily add more urls (facebook.com, twitter.com) and it will still work and be clear after a year of not seeing the code while the regex is too cryptic for my aging eyes (and brain :-)). Also if you don't want to check the whole blacklist every time, you can add a break into the core condition of the validation method but I think the user will have a better feedback like this.

Licenciado bajo: CC-BY-SA con atribución
No afiliado a StackOverflow
scroll top