Question

I'm maintaining code written by someone just before they retired, which means I can't find them to ask questions. :-) This is basically a C++ wrapper to launch a program. The chunk of code in question is this:

BOOL bSuccess = CreateProcess(NULL, (char *)strBatFile.c_str(),
    NULL, NULL, TRUE, CREATE_NO_WINDOW, NULL, strLocalWorkingDir.c_str(), &si,  &pi );

   if( bSuccess )
   {
      DWORD dwMillisec = INFINITE;      
      DWORD dwWaitStatus = WaitForSingleObject( pi.hProcess, dwMillisec );

      if( dwWaitStatus == WAIT_OBJECT_0 )
      {
         DWORD dwExitCode = NULL;
         GetExitCodeProcess( pi.hProcess, &dwExitCode );
         nRet = (int)dwExitCode;
        }

      CloseHandle( pi.hThread );
      CloseHandle( pi.hProcess );
   }
   else
      nRet = START_PROCESS_FAILED;

If just one instance is run at a time, it always works fine. If multiple are run within a very short time frame, though, about half of them are having dwExitCode set to 1 instead of 0, even though the process isn't crashing, and the log file that internal program writes is completing.

So to clarify, the process is always starting fine, and it's always getting into the if statements, but it's the value of dwExitCode set by GetExitCodeProcess that isn't containing what's expected. Since we error check on this, we're flagging a bunch of these runs as incomplete when they in fact are fine.

Is there any way this value could be set to something different than the process exit code? And/or is there a utility I could run at the same time to confirm the exit codes are what I think they are?

Thanks!

ETA: Just realised this is putting the internal program call in a .bat file - "C:\\ --flags etc..." and then calling that as a command line in the second argument, rather than just calling it directly using lpApplicationName. No idea if this makes a difference! But when I print the PID of the process, I can see it's the PID for a cmd.exe process, and then our program has a child PID. However, when I trace in Process Monitor, I can see that both parent and child are exiting with exit code 0.

Was it helpful?

Solution

Found it! The application itself was in fact returning a 0 error code...it was the shell around it that was returning 1. And that was due to that .bat file in the second argument. The name was being generated from time, so it ended up being exactly the same name if multiple instances were run too closely together. This is why the inner app would run fine...there was always a bat file there with that name. But there were access collisions when the different instances were trying to generate or clean up the bat, from what I can tell.

As a proof-of-concept hack I just added the current PID to the end of the file name, and everything worked perfectly. Now I just need to decide the real fix, which I think will likely be getting rid of the whole bat file mechanism entirely and calling the app directly.

Whew! Thanks for all the help, everyone! Sadly, the bit of code I included didn't have the offending line, but everyone's tips above helped me narrow in on the problem. :-)

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top