How to get text from the screen

https://stackoverflow.com/questions/514959

21-08-2019
|

Question

There is some Win OS API call or so that would let one obtain text from the screen

not via obtaining a snapshot and then doing OCR on it, but via API

the idea is to get the text that is under the mouse that the user points to and clicks on.

This is how tools like Babylon (http://www.babylon.com) and 1-Click Answers (http://www.answers.com/main/download_answers_win.jsp) and many others work.

Can someone point me to the right direction to get this functionality?

Solution

There is no direct way to obtain text. An application could render text in a zillion different ways (Windows API being one of them), and after it's rendered - it's just a bunch of pixels.

A method you could try however is to find the window directly under the mouse and trying to get the text from them. This would work fine on most standard Windows controls (labels, textboxes, etc.) Wouldn't work on Internet browsers though.

I think the best you can do is make your application such that it supports as many different (common) controls as possible in the above described manner.

OTHER TIPS

You can get the text of every window with the GetWindowText API. The mouse position can be found with the GetCursorPos API.

In Delphi you could use this function (kudos to Peter Below)

Function ChildWindowUnderCursor: HWND;
Var
  hw, lasthw: HWND;
  pt, clientpt: TPoint;
Begin
  Result := 0;
  GetCursorPos( pt );
  // find top-level window under cursor
  hw := WindowFromPoint( pt );
  If hw = 0 Then Exit;

  // look for child windows in the window recursively
  // until we find no new windows
  Repeat
    lasthw := hw;
    clientpt := Pt;
    Windows.ScreenToClient( lasthw, clientpt );
    // Use ChildwindowfromPoint if app needs to run on NT 3.51!
    hw := ChildwindowFromPointEx( lasthw, clientpt, CWP_SKIPINVISIBLE );
  Until hw = lasthw;
  Result := hw;
End;

Regards,
Lieven

Windows has APIs for accessibility tools like screen-readers for the blind. (Newer versions are also used for other purposes, like UI automation and testing.) It works with many applications, even most browsers which render their own content without using the standard Windows controls. It won't work with all applications, but it can be used to figure out the text under the mouse in most cases.

The current API is called the Windows Automation API. Describing how to do this in general is beyond the scope of a Stack Overflow answer, so I've simply provided a link to the documentation.

The older API that was widely available when this question was first posted is called the Microsoft Active Accessibility API. As with the modern APIs, the scope here is too broad to detail here.

Note that documentation for both APIs is written both for both developers building accessibility tools (like screen readers) as well as for developers writing apps that want to be compatible with those accessibility tools.

The basic idea is that an accessibility tool gets COM interfaces provided by the target application's window(s), and it can use those interfaces to figure out the controls and their text and how they're related both logically and spatially. Applications that are composed of standard Windows controls are mostly automatically supported. Applications with custom UI implementations have to do work to provide these interfaces. Fortunately, the important ones, like the mainstream browsers, have done the work to support these interfaces.

i think its called the clipboard. i am going to bet these programs inject click and double click & keyboard events and then copy items there for inspection. Alternatively, they are gettin jiggy with the windows text controls, and grabbing content that way. i suspect due to security issues, these tools have problems running in vista also.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow