Question

the main question is a bit short so I'll collaborate. I'm building an app for twitter with which you can do the basic actions (get posts, do a post, reply etc.)

Now I figured it would be a good idea if I'd check the max 140 char limit in my app. So far so good, then someone asked if I could also do the url-shortener thing.

so at the moment I have a regex that picks op most (in fact too much) url's, takes the lenght of them and either adds or deduces the difference from the 140 max. It's still a but buggy but I can manage that.

Now my problem....

It seems twitter is quite picky in what they think is an url: I got the most basic ones (starting with http(s):// and such), but twitter also replaces some tld's very easily, (www.)google.com [whatever].net/.biz/.info are just a few of them) but not .nl .de .tk

Now I was wondering if perhaps someone has found out which ones they do and which ones they don't 'shorten'.

now because I'm pretty sure my regex isn't the best either I'll drop that here as well:

((http|https):\/\/[\w\-_]+(\.[\w\-_]+)+([\w\-\.,@?^=%&:\/~\+#]*[\w\-\@?^=%&\/~\+#])?)|([\w\-_]+(\.[\w\-_]+)+([\w\-\.,@?^=%&:\/~\+#]*[\w\-\@?^=%&\/~\+#])?)
Was it helpful?

Solution 3

I figured it out, I found a pretty important line on the tld wikipage. It states that all country TLD's are two chars long. And also the other way around; all 2 char tld's are countries. With that in mind, I started testing a bunch of them with twitter and I'm pretty sure I now know what url's twitter shortens and which ones they don't.

  • All url's starting with http:// or https://
  • All url's like [something].[non country tld] # .com .biz .mobi etc. (Except .arpa & .aero)
  • All url's like [something].[something].[valid tld] # including countries

  • links like http://[user]:[pass]@[something].[tld] will NOT be shortened

Now to build a regex for it, i'll post it here as soon as I think I have it :D

this is what I got this far:

/(^(?:(?:ht|f)tp(?:s?)\:\/\/|~\/|\/)?(?:(?:[-\w]+\.)+(?:com|asia|cat|coop|edu|int|tel|pro|org|net|gov|mil|biz|info|mobi|name|jobs|museum|travel|([a-z]{2})))(?::[\d]{1,5})?(?:(?:(?:\/(?:[-\w~!$+|.,=\(\)]|%[a-f\d]{2})+)+|\/)+|\?|#)?(?:(?:\?(?:[-\w~!$+|.,*:]|%[a-f\d{2}])+=?(?:[-\w~!$+|.,*:=]|%[a-f\d]{2})*)(?:&(?:[-\w~!$+|.,*:]|%[a-f\d{2}])+=?(?:[-\w~!$+|.,*:=]|%[a-f\d]{2})*)*)*(?:#(?:[-\w~!$+|.,*:=]|%[a-f\d]{2})*)?)/gim;

one major flaw still in it, it also accepts [domain].[tld] which twitter doesn't.

I hope this will help someone in the future. I'm pretty sure there's not a whole lot easy-to-find info about this on the web (or at least I couldn't find it).

OTHER TIPS

http://support.twitter.com/articles/78124-how-to-shorten-links-urls# indicates that all URLs posted to Twitter will be rewritten to be exactly 19 characters long.

I am using this: var url_expression = /[-a-zA-Z0-9@:%_\+.~#?&//=]{2,256}\.[a-z]{2,4}\b(\/[-a-zA-Z0-9@:%_\+.~#?&//=]*)?/gi; Nobody has complained :)

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top