Spring MVC and parsing HTML

https://stackoverflow.com/questions/1367882

21-09-2019
|

Question

I have a controller class as in the following code segment:

@Controller
public class SiteEntryController 
{
    @RequestMapping(value="/index.htm")
    public ModelAndView handleIndex(
            @RequestParam(value="enableStats", required=false) String enableStats) 
    {
        ModelMap map = new ModelMap();

        ...........

        return new ModelAndView("index", map); 
    }
}

and ViewResolver is defined as the following :

<bean class="org.springframework.web.servlet.view.InternalResourceViewResolver" 
    p:prefix="/WEB-INF/jsp/" p:suffix=".jsp"/>

What I need to do is, to read/parse content of index.jsp which has only html markup, and return some information about the tags used in the content. I will look for how to view the information that I have gathered, but first, I couldn't figure out how to access the content. I am familiar with java, but totally new to Spring. So my question would be silly one. :) Just for your information, what I couldn't figure out is not how to parse HTML, it is about how to get the file content.

I did this stuff with normal file operations as in the following:

public class Main {

private static HTMLEditorKit.ParserCallback callback;

/**
 * @param args
 */
public static void main(String[] args) {
    try {
        Reader r = new FileReader("D:/WS/TestP/resource/index.htm");
        ParserDelegator parser = new ParserDelegator();
        callback = new Detector();
        parser.parse(r, callback, false);
    } catch (Exception _ex) {
        _ex.printStackTrace();
    }
    HashMap<String, Integer> map = ((Detector) callback).getMap();
    Set<String> keys = map.keySet();
    Iterator<String> it = keys.iterator();
    String key;
    ArrayList<TagFrequency> list = new ArrayList<TagFrequency>();
    TagFrequency tf;
    int i = 0, j = 0;
    while (it.hasNext()) {
        key = it.next();
        i = map.get(key);
        tf = new TagFrequency(key, i);
        if (list.size() == 0)
            list.add(tf);
        else {
            j = 0;
            while (j < list.size() && tf.compareTo(list.get(j)) > 0) {
                j++;
            }
            if (j==list.size())
                list.add(tf);
            else {
                list.add(j, tf);
            }
        }
    }
    for (int ind = list.size(); ind>0 ; ind--) {
        System.out.println(list.get(ind-1).toString());
    }       
}
}

Detector.java:

public class Detector extends HTMLEditorKit.ParserCallback {

    private HashMap<String, Integer> map = new HashMap<String, Integer>();

    public Detector () {
    }

    @Override
    public void handleStartTag(HTML.Tag t, MutableAttributeSet a, int pos) {
        String str = t.toString();
        if (map.get(str) == null) {
            map.put(str, Integer.valueOf(1));
        } else {
            map.put(str, map.get(str) + 1);
        }
    }

    public HashMap<String, Integer> getMap() {
        return map;
    }
}

and TagFrequency.java:

public class TagFrequency {
    private String tag;
    private Integer i;
    public TagFrequency(String tag, Integer i) {
        super();
        this.tag = tag;
        this.i = i;
    }
    public String getTag() {
        return tag;
    }
    public void setTag(String tag) {
        this.tag = tag;
    }
    public Integer getI() {
        return i;
    }
    public void setI(Integer i) {
        this.i = i;
    }
    public String toString() {
        return this.tag + " " + this.i;
    }

    public int compareTo(TagFrequency tf) {
        if (tf.i > this.i)
            return -1;
        else if (tf.i < this.i)
            return 1;
        else
            return (-1)*this.tag.compareToIgnoreCase(tf.tag);
    }
}

Solution

You can implement it using a servlet filter that will pass on a servlet response wrapper caching the HTML, and then run your code on it.

A project which uses this technique is SiteMesh, so you can a pick in its code.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow