How can I find all the Guids in some text?
Question
I've got a bunch of web page content in my database with links like this:
<a href="/11ecfdc5-d28d-4121-b1c9-1f898ac0b72e">Link</a>
That Guid unique identifier is the ID of another page in the same database.
I'd like to crawl those pages and check for broken links.
To do that I need a function that can return a list of all the Guids on a page:
Function FindGuids(ByVal Text As String) As Collections.Generic.List(Of Guid) ... End Function
I figure that this is a job for a regular expression. But, I don't know the syntax.
Solution
Function FindGuids(ByVal Text As String) As List(Of Guid) Dim Guids As New List(Of Guid) Dim Pattern As String = "[a-fA-F0-9]{8}-([a-fA-F0-9]{4}-){3}[a-fA-F0-9]{12}" For Each m As Match In Regex.Matches(Text, Pattern) Guids.Add(New Guid(m.Value)) Next Return Guids End Function
OTHER TIPS
[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}
Suggest you grab a free copy of expresso and learn to build them!
Here's a 10 second attempt with no optimization, checks upper and lower case and creates a numbered capture group:
([a-fA-F0-9]{8}-[a-fA-F0-9]{4}-[a-fA-F0-9]{4}-[a-fA-F0-9]{4}-[a-fA-F0-9]{12})
Then you just have to iterate through the matched groups...
There are easier ways to check for broken links.... for example I think http://www.totalvalidator.com/ will do it :D
This could also help
static Regex isGuid =
new Regex(@"^(\{){0,1}[0-9a-fA-F]{8}\-[0-9a-fA-F]{4}\-[0-9a-fA-F]{4}\-[0-9a-fA-F]{4}\-[0-9a-fA-F]{12}(\}){0,1}$", RegexOptions.Compiled);
and then
static bool IsGuid(string candidate, out Guid output)
{
bool isValid = false;
output=Guid.Empty;
if(candidate!=null)
{
if (isGuid.IsMatch(candidate))
{
output=new Guid(candidate);
isValid = true;
}
}
return isValid;
}