Apache HttpClient, User-Agent, and corporate proxy

https://stackoverflow.com/questions/13664526

04-12-2021
|

Question

This is somewhat of a speculative question in that the answer may not be apparent in the info I have available, but I am hoping that someone with sufficient experience will recognize a likely answer based on common practices for corporate proxies.

I work (not as a software developer) behind a corporate proxy. In my spare time I was messing around with a Java program I'm developing. This program needs to make a few very simple HTTP GET requests, and I'm using Apache HttpClient for that. I was concerned at first about whether or not I'd make it through the proxy server. In our web browsers, the proxy server is simple entered into the network settings... no authentication needed. So, I added the following to my Java program:

myClient.getParams().setParameter(ConnRoutePNames.DEFAULT_PROXY, MY_PROXY);

Sure enough, it worked! However, I had another concern. The HTTP requests coming from my program probably had some strange User-Agent specified (I've since confirmed this is the case), and I did not want them to ever trigger any sort of suspicion in automated or manual packet inspections. So I said to myself, "why not just set the User-Agent header to be the same as the browser on this machine?"

myClient.getParams().setParameter(CoreProtocolPNames.USER_AGENT, BROWSER_AGENT);

Here is where it gets weird. If the BROWSER_AGENT string above is set to exactly the same value as the corporate supplied browser on my machine (either IE or FF), I get an "authentication failed, missing credentials" type error message returned from the corporate proxy server. But, if I set the User-Agent header to something generic, like say Mozilla 5.0 or even a totally bogus string, or even an empty string, it all works fine! The parts that confuse me are:

When User-Agent is set to the same as my browser (a long complex string), I "fail authentication" somehow, which makes no sense since in the real browser I provide no authentication information (unless it comes from some pre-installed certificate maybe?)
If the corporation requires authentication for any requests sent to the proxy server on port 80, then how come they let random User-Agent strings get through? Oversight? Some other reason I can't comprehend?

Hopefully this question is not too speculative to be deemed constructive. I'd love to hear from people with experience in this area. Thanks.

Solution

By default, HTTPClient identifies itself as the user agent. As you have seen, you can override this to any string you want.

Looks like your proxy servers is configured to automatically add user credentials based on browser type however due to some exception found, your admin added an exception rule, ie, when the user-agent is not known, just let it through. Personally, I think it is a very bad security policy since as you found out, all program can go through your proxy without authentication just by using a bogus user-agent.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow