Jena SDB IRI validation

https://stackoverflow.com/questions/14478969

17-01-2022
|

Question

I have got several strange IRIs that I want to insert into Jena SDB, but I got some error messages:

http://example.org/text/1234#offset_2311_2317_10-12%
the error message is:
Code: 30/ILLEGAL_PERCENT_ENCODING in FRAGMENT: The host component a percent occurred without two following hexadecimal digits.
http://example.org/text/5678#offset_365_370_NDZ#2
the error message is:
Code: 0/ILLEGAL_CHARACTER in FRAGMENT: The character violates the grammar rules for URIs/IRIs.
http://example.org/text/7890#offset_8872_8878__"Fren
the error message is:
Code: 4/UNWISE_CHARACTER in FRAGMENT: The character matches no grammar rules of URIs/IRIs. These characters are permitted in RDF URI References, XML system identifiers, and XML Schema anyURIs.

The string 10-12%, NDZ#2 and _"Fren are extracted from plain text document and I have to attach it directly at the back of the IRIs. So my question is: are they valid IRIs? If not, considering I need to attach plain text at the back of IRIs, how can I convert them to valid IRIs?

Solution

1 is wrong because it ends in % -- % is for hex encoding so it must be %xx

Encode the % -- use %25

2 is wrong because it has two fragments. USe %23 is you mean # as a charcater, not as a fragment

3 has " in it. Encode that.

Spaces are a bad idea as well. Use %20.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow