Parallel vs Distributed computing----the dividing line

Question 1

Parallel computing :

Same application/process shall be split, executed/run concurrently on multiple cores/GPUs to process tasks in parallel (It can be at bit-level, instruction-level, data, or task level).
Resources are tightly coupled - Memory shall be shared across all the cores/GPUs within the system which in turn shall be used for exchange of information (Requires minimal communication for synchronization).
Usage brings in improvement of performance of system as the main focus is on using the processing power of multiple cores/GPUs in parallel.

There are various parallel systems.

Multiprocessor parallel system The Processors have direct access to shared memory(UMA model). Processors are closely placed, connected by an interconnection network and the Inter process communication shall be done through read and write operations on shared memory and message passing primitives provided by MPI . Here typically processors are of same type (also run same OS) and shall be within same computer/device with shared memory. Hardware & software are very tightly coupled

Multicomputer Parallel Systems : Here, the processors do not have direct access to shared memory and the memory of multiple processors may or may not form a common address space(NUMA). Processors shall be placed closely (do not have common clock) and connected by an interconnection network communicating over common address space or message passing.

Distributed computing :

Program/problem is divided and the components of a larger program are distributed such that these tasks shall be executed/run across multiple computers (computing devices) typically separated but connected in a network.
Resources are loosely coupled - Memory shall be distributed (or private to the computer) and messaging mechanisms shall be used between multiple computers because the tasks can be of varied nature and require IPC during execution. It can be with different processors / different OS and co-operate with one another. Typically they will not have common clock or shared common memory. ( Processors shall be typically communicating over a network - Processors can be geographically placed wide apart or reside on a WAN or on a LAN )
Usage brings in improvement of scalability of system, reliability / availability, heterogeneity.

Shouldn't clusters be distiributed systems only?

Typically, a cluster comprises of many distributed/separate systems that do not share memory but networked across uniformly. However, within a typical cluster, there shall be parallelism of applications for improvement of performance of clusters. It should also be noted that a parallel computing algorithm can be done using shared memory based system or in a distributed system (using message passing).

Question 2

As you mentioned it depends on the context. There are two major contexts:

how is the cluster internally handling its tasks (for instance to maintain a consistent cluster state)
How are applications using the cluster.

Internal algorithms are by their nature distributed. Think about master election and membership algorithms as an example (of course clusters have considerably more tasks; this doesn't mean that there are no parallel ones). On the other hand applications parallelize very often their workloads to run on clusters. Clusters very often provide apis or components like schedulers to enable that functionality. Another example are hadoop type of workloads and their apis. Parallelism is also used by databases that use parallel query to execute complex queries concurrently on more than one node.