Question

Background: I have a .Net 3.5 WPF "Prism"-based application running on Windows XP and Windows PosReady 2009 PCs. The app runs on PCs that are shut down every night (via a C# call to "shutdown.exe") and booted fresh in the morning (via Wake-on-LAN). The application is touch-based (using ELO touch screens), there are no mice or keyboards attached and the users do not have access to Windows.

Issue: We sporadically see issues where one of two things happens; either the application doesn't seem to load correctly and we see a blank white form showing, or it stops responding to touch. From looking in our (log4net) logs we can see that we are still handling the touch events and logging them out in both cases. Often this seems to occur when switching views and we also see in the logs where the Prism RegionManager is removing and adding views appropriately.

Troubleshooting: The application is running on approx ~100 PCs using images applied with Clonezilla and this occurs only sporadically. Since it isn't happening on all PCs and there are no exceptions logged or anything indicative of an issue in the Event Viewer we've resorted to more PC and OS level fixes. Specifically, we tried restarting the application and the PCs with occasional short-term success - meaning that sometimes the application will function correctly after these restarts, but only for a matter of hours at most. We've also worked under the assumption that the application has somehow been corrupted and we've removed and reinstalled it, without success.

The only thing that seems to resolve the issue is a repair of the .Net framework using the provided .Net 3.5 SP1 Installer package.

Conclusion: Since this seems to resolve the issue when nothing else does, it appears that we are somehow corrupting a GAC'd framework dll - either through code or the boot/shutdown procedures on the PC.

Questions: This leads to a number of questions:

  • Any ideas on how we can further identify the source of the issue?
  • Any ideas on what we can do to prevent this issue?
  • Any ideas on what the underlying issue might be?

Thanks for any help.

Était-ce utile?

La solution

We were finally able to get a hold of a production machine exhibiting this behavior and through a number of troubleshooting steps, including sending dump files to Microsoft, the issue was located.

The WPF Font Caching Windows service was occasionally getting into a corrupted state, causing a simple cache request to block indefinitely. This hang caused all of the behaviors described above in the our WPF application.

Simple solution: stop and disable the service. After disabling the service and rebooting the PC the service is no longer in use and we don't see any of these issues. In theory this leads to longer application load times, but we have seen zero negative impact.

Note that there are two versions of the service: 3.0.0.0 and 4.0.0.0. If your application is targeting .Net 3.0 or 3.5 you'll need to disable the 3 service, and if targeting 4.0+ you'll need to disable the 4 service.

Thanks to all for your comments and suggestions.

Autres conseils

We have had problems reminding of yours with our WPF application when connecting touch screens. This was due to a bug in the automation framework in .Net. It caused our application to either become very slow or entirely hang the GUI thread.

You can read more about the problem at: http://social.msdn.microsoft.com/Forums/en-IE/windowsaccessibilityandautomation/thread/6c4465e2-207c-4277-a67f-e0f55eff0110

The workaround suggested in the thread above where one removes any listeners of automation event periodically worked for us.

This is not a real answer but since I do not have enough rep? (I guess) I can't use the comment function :)

Try a global Error catch and see what it produces.

 public partial class App : Application
    {   
        [STAThread]
        public static void Main()
        {
                var application = new App();

                application.DispatcherUnhandledException += 
                    new DispatcherUnhandledExceptionEventHandler(application_DispatcherUnhandledException);

                application.InitializeComponent();
                application.Run();
        }

        static void application_DispatcherUnhandledException(object sender, DispatcherUnhandledExceptionEventArgs e)
        {
            LogAndClose("Global exception: " + e.Exception.ToString());
        }

        public static void Log(string text)
        {
            try
            {
                System.IO.File.AppendAllText(Environment.CurrentDirectory + "\\Log.txt",
                    "[" + DateTime.Now.ToString("MM/dd/yy HH:mm:ss") + "] " + text + "\r\n");
            }
            catch { }
        }

        public static void LogAndClose(string text)
        {
            Log(text);

            try
            {
                Application.Current.Shutdown();
            }
            catch { }
        }
    }

Have you tried remote debugging the production system?

What you need to remote debug are:

  • deploy msvcmon.exe
  • network connection between your development and production system
  • make sure your local and remote version of the code are in sync. You can also build on your dev machine, and xcopy deploy your debug build to the remote machine. If it's pure .net code that is the easy. If you also have C++ code you should make sure the debug versions of the C++ dlls are on the production machine. Or, build the release version and remote debug that.
  • setup a user account used for the connection. This is actually a bit tricky. Google remote debugging credentials for a few tips.
  • don't forget to disable all firewalls!

You can attach to an already running process, but you can also start the app from inside visual studio.

If your development system is located far away from the production system, use a laptop and remote desktop to bring your developer studio to the production system. I do this routinely. Even a five metre distance between the two is annoying.

I can elaborate on this, if there's interest, or if you run in to trouble setting up the connection.

Good luck!

Try using ANTS profiler to see if you have a memory leak. You can find out easily with just the 2 week trial version that they give.

Licencié sous: CC-BY-SA avec attribution
Non affilié à StackOverflow
scroll top