freeing "copy-on-write" memory that wasn't changed

Question 1

CoW is just a lazy optimization. You may freely think that fork() always makes the full copy of the process (in terms of memory at least) without any delay. But…

If you did prepare dynamic data chunk to "pass" to fork's child, then after fork you have two processes with two dynamic data chunks: parent and child (both are copies). When child exits, it's copy of memory is reclaimed, but parent should free that chunk right after fork by itself.

To be more clear, here is an example:

char *buf = malloc(123456);
// … fill buf for child …

int res = fork();

if (res == -1) {
    fprintf(stderr, "fork failed\n");
    exit(EXIT_FAILURE);
}

if (res == 0) {
    // this is child process
    // … do work with buf …
    _Exit(EXIT_SUCCESS); // child reclaims buf by means of exit
}

// this is parent process
free(buf); // we don't need it in parent

// … other parent tasks here …

CoW is also very useful optimization in fork-exec technique, where child does nothing but exec with prepared arguments. exec replaces current process with specified executable image, retaining open descriptors and other things (more in man 2 execve). The only page that is copied after such fork is only current stack frame, making fork-exec very effective.

Some systems also provide vfork, that is very restrictive unfair version of fork, but on systems without CoW that is the only way to vfork-exec efficiently.

Question 2

First the logical (process centered) view:

When you fork a process, the entire address space is copied into a new process as is. Your heap is essentially duplicated in both processes, and both processes can continue to use it just like one process could if fork() had never been called. Both processes can free an allocation that was done before the fork(), and they must do so if they want to reuse the address range connected to the allocation. CoW mappings are only an optimization that does not change these semantics.

Now the physical (system centered) view:

Your system kernel does not know about data ranges you have allocated using malloc(), it only knows about the memory pages it has allocated to the process at the request of malloc(). When you call fork() it marks all these pages as CoW, and references them from both processes. If any of the two processes writes to any of the CoW pages while the other process still exists, it will trap into the system which copies the entire page. And if one of the processes exits, it will at the very least lower the reference count of these pages, so that they do not have to be copied anymore.

So, what happens when you call free() in the child before exiting?
Well, the free() function will most likely write to the page containing the memory allocation to tell malloc() that the block is available again. This will trap into the system and copy the page, expect this operation to take a microsecond or two. If your parent process calls free() while the child is still alive, the same will happen. However, if your child does not free the page and exits, the kernel will know that it does not have to perform CoW anymore. If the parent frees and reuses the memory region afterwards, no copy needs to be done.

I assume, that what your child does is simply to check for some error condition, and exit immediately if it is met. In that case, the most prudent approach is to forget about calling free() in the child, and let the system do its work.