Question

I want to write an online application that:

  1. reads the URL from address bar of the browser
  2. extracts its lexical features (like n-grams)
  3. extracts its host based features (fetch DNS records online, its A, PTR, TTL fields)
  4. classify the URL into malicious or benign (using machine learning)

Can anyone help me with 1 and 3?

Was it helpful?

Solution

I don't believe this (application) is a task you can accomplish, as you can't really determine site content based on url.

See something like Mozilla Phishing Protection Design Documentation and Google Safe Browsing spec instead

OTHER TIPS

No idea what language you may be looking at.

For Item 1 here is a .net library that maybe helpful

http://msdn.microsoft.com/en-us/library/system.web.httputility.aspx

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top