문제

I'm using Jsoup with relaxed whitelist. It seems perfect but I would like to keep the embedded images tags like <img alt="" src="data:;base64.

Is there a way to modify the whitelist to accept also those img?

Edit:

If I use Whitelist.relaxed().addProtocols("img","src","data") then those img tags are not removed. But it accepts anything after "data:" and I would like just to keep them if src content starts with "data:;base64". Is it possible with jsoup?

도움이 되었습니까?

해결책

You can extend Whitelist and override isSafeAttribute to perform custom checks. As there's no way to extend Whitelist.relaxed() directly, you'll have to copy some code to set up the same list:

public class RelaxedPlusDataBase64Images extends Whitelist {
    public RelaxedPlusDataBase64Images() {
        //copied from Whitelist.relaxed()
        addTags("a", "b", "blockquote", "br", "caption", "cite", "code", "col",
                "colgroup", "dd", "div", "dl", "dt", "em", "h1", "h2", "h3", "h4", "h5", "h6",
                "i", "img", "li", "ol", "p", "pre", "q", "small", "strike", "strong",
                "sub", "sup", "table", "tbody", "td", "tfoot", "th", "thead", "tr", "u",
                "ul");
        addAttributes("a", "href", "title");
        addAttributes("blockquote", "cite");
        addAttributes("col", "span", "width");
        addAttributes("colgroup", "span", "width");
        addAttributes("img", "align", "alt", "height", "src", "title", "width");
        addAttributes("ol", "start", "type");
        addAttributes("q", "cite");
        addAttributes("table", "summary", "width");
        addAttributes("td", "abbr", "axis", "colspan", "rowspan", "width");
        addAttributes("th", "abbr", "axis", "colspan", "rowspan", "scope", "width");
        addAttributes("ul", "type");
        addProtocols("a", "href", "ftp", "http", "https", "mailto");
        addProtocols("blockquote", "cite", "http", "https");
        addProtocols("cite", "cite", "http", "https");
        addProtocols("img", "src", "http", "https");
        addProtocols("q", "cite", "http", "https");
    }

    @Override
    protected boolean isSafeAttribute(String tagName, Element el, Attribute attr) {
        return ("img".equals(tagName)
                && "src".equals(attr.getKey())
                && attr.getValue().startsWith("data:;base64")) ||
            super.isSafeAttribute(tagName, el, attr);
    }
}

As you haven't provided the code you're using to parse or the HTML you're sanitizing, I haven't tested this.

라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top