Question

I am having problems understanding Net::HTTP and Nokogiri.

I have a large number of jobs on my Jenkins server. I have to periodically update the branch name on these jobs. Doing it from the UI is a cumbersome process so I decided to update the Jenkins config.xml.

I use Nokogiri to parse the XML, traverse the XPath and update the value of the node. However, when I try to post the updated XML back to Jenkins, I get a 500 error saying:

Caused by: javax.xml.transform.TransformerException: org.xml.sax.SAXParseExceptionpublicId: -//W3C//DTD HTML 4.0 Transitional//EN; systemId: http://www.w3.org/TR/REC-html40/loose.dtd; lineNumber: 31; columnNumber: 3; The declaration for the entity "HTML.Version" must end with '>'.

Here is what I am doing:

require "net/http"
require "nokogiri"

uri = URI.parse("http://jenkins.my.domain.web:8080")
http = Net::HTTP.new(uri.host, uri.port)

getQueueRequest = Net::HTTP::Get.new("http://jenkins.my.domain.web:8080/my/job/location/config.xml")
getQueue = http.request(getQueueRequest)

xml_doc = Nokogiri::HTML(getQueue.body)

# Get current branch name
branch_name=xml_doc.at_xpath('//hudson.plugins.git.branchspec/name')

# Get new branch name
print "Enter new branch name "
user_input = gets.chomp
new_branch_name = user_input.downcase

# Set branch name and create xml
branch_name.content=new_branch_name
new_config_xml=xml_doc.to_xml

puts "Logging into Jenkins"

update_branch = Net::HTTP::Post.new("http://jenkins.my.domain.web:8080/my/job/location/config.xml")
update_branch.basic_auth 'username', 'password'
update_branch.body = new_config_xml

response = http.request(update_branch)

puts response.body

I understand it might have to do something with the XML that is getting added to request body but I am not sure how to fix the issue.

Original XML:

<?xml version='1.0' encoding='UTF-8'?>
<maven2-moduleset plugin="maven-plugin@1.504">
  <actions/>
  <description></description>
  <keepDependencies>false</keepDependencies>
  <properties>
    <hudson.plugins.throttleconcurrents.ThrottleJobProperty plugin="throttle-concurrents@1.7.2">
      <maxConcurrentPerNode>0</maxConcurrentPerNode>
      <maxConcurrentTotal>0</maxConcurrentTotal>
      <categories/>
      <throttleEnabled>false</throttleEnabled>
      <throttleOption>project</throttleOption>
      <configVersion>1</configVersion>
    </hudson.plugins.throttleconcurrents.ThrottleJobProperty>
  </properties>
  <scm class="hudson.plugins.git.GitSCM" plugin="git@1.4.0">
    <configVersion>2</configVersion>
    <userRemoteConfigs>
      <hudson.plugins.git.UserRemoteConfig>
        <name></name>
        <refspec></refspec>
        <url>git@github.com:<ORG_NAME>/<REPO_NAME>.git</url>
      </hudson.plugins.git.UserRemoteConfig>
    </userRemoteConfigs>
    <branches>
      <hudson.plugins.git.BranchSpec>
        <name>release</name>
      </hudson.plugins.git.BranchSpec>
    </branches>
    <disableSubmodules>false</disableSubmodules>
    <recursiveSubmodules>false</recursiveSubmodules>
    <doGenerateSubmoduleConfigurations>false</doGenerateSubmoduleConfigurations>
    <authorOrCommitter>false</authorOrCommitter>
    <clean>false</clean>
    <wipeOutWorkspace>false</wipeOutWorkspace>
    <pruneBranches>false</pruneBranches>
    <remotePoll>false</remotePoll>
    <ignoreNotifyCommit>false</ignoreNotifyCommit>
    <useShallowClone>false</useShallowClone>
    <buildChooser class="hudson.plugins.git.util.DefaultBuildChooser"/>
    <gitTool>Default</gitTool>
    <submoduleCfg class="list"/>
    <relativeTargetDir></relativeTargetDir>
    <reference></reference>
    <excludedRegions></excludedRegions>
    <excludedUsers></excludedUsers>
    <gitConfigName></gitConfigName>
    <gitConfigEmail></gitConfigEmail>
    <skipTag>false</skipTag>
    <includedRegions></includedRegions>
    <scmName></scmName>
  </scm>
  <canRoam>true</canRoam>
  <disabled>false</disabled>
  <blockBuildWhenDownstreamBuilding>false</blockBuildWhenDownstreamBuilding>
  <blockBuildWhenUpstreamBuilding>false</blockBuildWhenUpstreamBuilding>
  <triggers class="vector">
    <hudson.triggers.TimerTrigger>
      <spec>0 22 * * 4</spec>
    </hudson.triggers.TimerTrigger>
  </triggers>
  <concurrentBuild>false</concurrentBuild>
  <rootModule>
    <groupId>com.org.project.test</groupId>
    <artifactId>functest</artifactId>
  </rootModule>
  <goals>clean verify -Dtestsuite=<test_suite_name> -Dbrowser=chrome -Dipaddress=http://<IP_ADDRESS>:4444/wd/hub</goals>
  <mavenName>apache-maven-3.0.4</mavenName>
  <aggregatorStyleBuild>true</aggregatorStyleBuild>
  <incrementalBuild>false</incrementalBuild>
  <perModuleEmail>true</perModuleEmail>
  <ignoreUpstremChanges>false</ignoreUpstremChanges>
  <archivingDisabled>false</archivingDisabled>
  <resolveDependencies>false</resolveDependencies>
  <processPlugins>false</processPlugins>
  <mavenValidationLevel>-1</mavenValidationLevel>
  <runHeadless>false</runHeadless>
  <disableTriggerDownstreamProjects>false</disableTriggerDownstreamProjects>
  <settings class="jenkins.mvn.DefaultSettingsProvider"/>
  <globalSettings class="jenkins.mvn.DefaultGlobalSettingsProvider"/>
  <reporters/>
  <publishers/>
  <buildWrappers/>
  <prebuilders/>
  <postbuilders/>
  <runPostStepsIfResult>
    <name>FAILURE</name>
    <ordinal>2</ordinal>
    <color>RED</color>
  </runPostStepsIfResult>
</maven2-moduleset>

After Editing and Massaging:

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<?xml version="1.0" encoding="UTF-8"?>
<html>
  <body>
    <maven2-moduleset plugin="maven-plugin@1.504">
      <actions />
      <description />
      <keepdependencies>false</keepdependencies>
      <properties>
        <hudson.plugins.throttleconcurrents.throttlejobproperty plugin="throttle-concurrents@1.7.2">
          <maxconcurrentpernode>0</maxconcurrentpernode>
          <maxconcurrenttotal>0</maxconcurrenttotal>
          <categories />
          <throttleenabled>false</throttleenabled>
          <throttleoption>project</throttleoption>
          <configversion>1</configversion>
        </hudson.plugins.throttleconcurrents.throttlejobproperty>
      </properties>
      <scm class="hudson.plugins.git.GitSCM" plugin="git@1.4.0">
        <configversion>2</configversion>
        <userremoteconfigs>
          <hudson.plugins.git.userremoteconfig>
            <name />
            <refspec />
            <url>git@github.com:<ORG_NAME>/<REPO_NAME>.git</url>
          </hudson.plugins.git.userremoteconfig>
        </userremoteconfigs>
        <branches>
          <hudson.plugins.git.branchspec>
            <name>master</name>
          </hudson.plugins.git.branchspec>
        </branches>
        <disablesubmodules>false</disablesubmodules>
        <recursivesubmodules>false</recursivesubmodules>
        <dogeneratesubmoduleconfigurations>false</dogeneratesubmoduleconfigurations>
        <authororcommitter>false</authororcommitter>
        <clean>false</clean>
        <wipeoutworkspace>false</wipeoutworkspace>
        <prunebranches>false</prunebranches>
        <remotepoll>false</remotepoll>
        <ignorenotifycommit>false</ignorenotifycommit>
        <useshallowclone>false</useshallowclone>
        <buildchooser class="hudson.plugins.git.util.DefaultBuildChooser" />
        <gittool>Default</gittool>
        <submodulecfg class="list" />
        <relativetargetdir />
        <reference />
        <excludedregions />
        <excludedusers />
        <gitconfigname />
        <gitconfigemail />
        <skiptag>false</skiptag>
        <includedregions />
        <scmname />
      </scm>
      <canroam>true</canroam>
      <disabled>false</disabled>
      <blockbuildwhendownstreambuilding>false</blockbuildwhendownstreambuilding>
      <blockbuildwhenupstreambuilding>false</blockbuildwhenupstreambuilding>
      <triggers class="vector">
        <hudson.triggers.timertrigger>
          <spec>0 22 * * 4</spec>
        </hudson.triggers.timertrigger>
      </triggers>
      <concurrentbuild>false</concurrentbuild>
      <rootmodule>
        <groupid>com.org.project.test</groupid>
        <artifactid>functest</artifactid>
      </rootmodule>
      <goals>clean verify -Dtestsuite=<test_suite_name> -Dbrowser=chrome -Dipaddress=http://<IP_ADDRESS>:4444/wd/hub</goals>
      <mavenname>apache-maven-3.0.4</mavenname>
      <aggregatorstylebuild>true</aggregatorstylebuild>
      <incrementalbuild>false</incrementalbuild>
      <permoduleemail>true</permoduleemail>
      <ignoreupstremchanges>false</ignoreupstremchanges>
      <archivingdisabled>false</archivingdisabled>
      <resolvedependencies>false</resolvedependencies>
      <processplugins>false</processplugins>
      <mavenvalidationlevel>-1</mavenvalidationlevel>
      <runheadless>false</runheadless>
      <disabletriggerdownstreamprojects>false</disabletriggerdownstreamprojects>
      <settings class="jenkins.mvn.DefaultSettingsProvider" />
      <globalsettings class="jenkins.mvn.DefaultGlobalSettingsProvider" />
      <reporters />
      <publishers />
      <buildwrappers />
      <prebuilders />
      <postbuilders />
      <runpoststepsifresult>
        <name>FAILURE</name>
        <ordinal>2</ordinal>
        <color>RED</color>
      </runpoststepsifresult>
    </maven2-moduleset>
  </body>
</html>
Was it helpful?

Solution

When you use Nokogiri::HTML(some_html) or Nokogiri::XML(some_xml), Nokogiri will look to see if the content is valid. If it isn't, it will do fix-ups on the content in an attempt to make it so. For instance:

require 'nokogiri'

html_fragment = "<p>foo bar</p>"
Nokogiri::HTML(html_fragment).to_html 
# => "<!DOCTYPE html PUBLIC \"-//W3C//DTD HTML 4.0 Transitional//EN\" \"http://www.w3.org/TR/REC-html40/loose.dtd\">\n<html><body><p>foo bar</p></body></html>\n"

If the document is partially correct Nokogiri still adds the DOCTYPE statement:

html = "<html><body><p>foo bar</p></body></html>"
Nokogiri::HTML(html).to_html 
# => "<!DOCTYPE html PUBLIC \"-//W3C//DTD HTML 4.0 Transitional//EN\" \"http://www.w3.org/TR/REC-html40/loose.dtd\">\n<html><body><p>foo bar</p></body></html>\n"

If you want Nokogiri to leave the document along, because it's supposed to be a fragment, tell it to do so:

Nokogiri::HTML::DocumentFragment.parse(html_fragment).to_html 
# => "<p>foo bar</p>"

Or:

xml_fragment = "<x>foo bar</x>"
Nokogiri::XML::DocumentFragment.parse(xml_fragment).to_xml 
# => "<x>foo bar</x>"

Nokogiri is pretty smart about handling XML and HTML. You can try to confuse it and it'll generally do the right thing:

xml_fragment = "<x>foo bar</x>"
Nokogiri::HTML::DocumentFragment.parse(xml_fragment).to_xml 
# => "<x>foo bar</x>"

That's parsing XML as an HTML fragment and telling it to emit it as XML.

Now, that all said, it's pretty obvious Nokogiri isn't doing anything mysterious, so, here's how to fix the problem. First, parse it as XML so Nokogiri doesn't think it should add the HTML DOCTYPE declaration, then, if the XML is syntactically correct, tell Nokogiri it's OK to parse it as a complete document:

require 'nokogiri'

xml = %{<?xml version='1.0' encoding='UTF-8'?>
<maven2-moduleset plugin="maven-plugin@1.504">
  <actions/>
  <description></description>
  <keepDependencies>false</keepDependencies>
  <properties>
    <hudson.plugins.throttleconcurrents.ThrottleJobProperty plugin="throttle-concurrents@1.7.2">
    </hudson.plugins.throttleconcurrents.ThrottleJobProperty>
  </properties>
</maven2-moduleset>
}
puts Nokogiri::XML.parse(xml).to_xml 

# >> <?xml version="1.0" encoding="UTF-8"?>
# >> <maven2-moduleset plugin="maven-plugin@1.504">
# >>   <actions/>
# >>   <description/>
# >>   <keepDependencies>false</keepDependencies>
# >>   <properties>
# >>     <hudson.plugins.throttleconcurrents.ThrottleJobProperty plugin="throttle-concurrents@1.7.2">
# >>     </hudson.plugins.throttleconcurrents.ThrottleJobProperty>
# >>   </properties>
# >> </maven2-moduleset>

Or as a fragment, which, because it's complete, will result in the same thing:

puts Nokogiri::XML::DocumentFragment.parse(xml).to_xml 

# >> <?xml version='1.0' encoding='UTF-8'?>
# >> <maven2-moduleset plugin="maven-plugin@1.504">
# >>   <actions/>
# >>   <description/>
# >>   <keepDependencies>false</keepDependencies>
# >>   <properties>
# >>     <hudson.plugins.throttleconcurrents.ThrottleJobProperty plugin="throttle-concurrents@1.7.2">
# >>     </hudson.plugins.throttleconcurrents.ThrottleJobProperty>
# >>   </properties>
# >> </maven2-moduleset>

Instead of using Net::HTTP, which is the bare-building blocks for HTTP, I'd recommend looking at something a bit higher-level, like HTTPClient. Here's code that is similar to yours:

require 'httpclient'
require 'nokogiri'

URL = 'http://jenkins.my.domain.web:8080/my/job/location/config.xml'

http_client = HTTPClient.new
xml_doc = Nokogiri::HTML(
  http_client.get_content(URL)
)

# Get current branch name using CSS for simplicity:
branch_name = xml_doc.at('hudson.plugins.git.branchspec name')

# Get new branch name
print 'Enter new branch name '
new_branch_name = gets.chomp.downcase

# Set branch name and create xml
branch_name.content = new_branch_name

puts 'Logging into Jenkins'

http_client.set_auth(domain, 'user', 'password')

response = http_client.post(URL, :body => xml_doc.to_xml)

I can't test it but it looks close.


I, now, find myself in another dilemma. I am seeing that the methods which allow moving to elements and editing values like at_xpath, at_css only work with Nokogiri::HTML or Nokogiri::HTML::DocumentFragment. They don't work when I use Nokogiri::XML. Using Nokogiri::HTML changes the case of the HTML tags. false becomes false. Jenkins does accept the xml with changed case of tags. Methods to_html, to_xml basically returns a string so I cannot use the xpath or css methods to navigate the xml tree. Is there a way around ?

The at methods work with both XML and HTML, and allows CSS and XPath selectors; Everything inside Nokogiri is really XML-based.

Nokogiri folds HTML tags to lower-case because HTML is case-insensitive, so at expects a lower-case value when dealing with HTML. XML is case-sensitive, so Nokogiri leaves the tag case alone, and at requires you to use the correct case when using CSS.

This is documented in the Nokogiri docs:

Note that the CSS query string is case-sensitive with regards to your document type. That is, if you’re looking for “H1” in an HTML document, you’ll never find anything, since HTML tags will match only lowercase CSS queries. However, “H1” might be found in an XML document, where tags names are case-sensitive (e.g., “H1” is distinct from “h1”).

OTHER TIPS

When you are parsing the XML you are receiving from the service, you are declaring it as HTML:

xml_doc = Nokogiri::HTML(getQueue.body)

And this appears to cause Nokogiri to add HTML nodes.

Try parsing it as XML instead:

xml_doc = Nokogiri::XML(getQueue.body)

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top