It can be done much easier.
Suppose you have two copies of the array: one sorted by Y-axis and another by X-axis. Now you'll iterate through the Y-sorted array and for each point (let name it cur) you should binary search an appropriate point (with the smallest p2.x - p1.x) in the X-sorted array. In case of binary search will find the same point or the point with Y-coordinate less than cur+D you should just delete that point from X-sorted array (we'll never need that point in X-sorted array again because we only increase Y-coordinate) and run binary search again. The answer will be the smallest of the binary search results.
As we need fast timing we should erase points from array quickly. It can be done by using binary tree - it can erase any node in O(logN) time and can do binary search in O(logN) time. As you delete each node from the tree only once and it takes O(logN + logN) time - total time will be O(N * logN). Preprocesssing time is O(N * logN) too. Also the memory taken will be O(N).
By the way your solution is also appropriate because actual N is 10^5 not 10^6. Which allows your solution to keep timing below 2 seconds, and to use less than 20MB of memory.