del_gendisk haging during cleanup of block device after media removal during IO

Question

This is caused by not ending the request that is broken by the device removal. In my driver, I had the folowing request_fn:

static void mydev_submit_req(struct request_queue *q)
{
    struct mydev_info *mydev = q->queuedata;

    if (!mydev){
        struct request* req;

        while ((req = blk_fetch_request(q)) != NULL){
               req->cmd_flags |= REQ_QUIET;
               __blk_end_request_all(req, -ENODEV);
        }
    } else {
        queue_work(mydev->wq, &mydev->work);
    }
}

This will prevent the requests entering the driver's workqueue when the device disappears (signified by the loss of mydev). However this hung because the last request was not actually completed, causing q->rq->elvpriv (now called q->nr_rqs_elvpriv) to remain at 1, which caused blk_drain_queue() to spin forever, which hung blk_cleanup_queue() and prevented the driver being able to remove the device.

The solution looks like this (in the workqueue callback function in my driver, but this depends on how you structure the IO work):

req = blk_fetch_request(q);
while (req) {
    // returns -ENODEV if the disk is ejected during transfer
    //bytes tells us how many bytes we managed to do
    res = mydev_do_req(q, req, &bytes);


    if (unlikely(res == -ENODEV)) {
        dev_err(&mydev->pdev->dev, 
            "device ejected during transfer, returning\n");

        //end the current request, since we started it
        //THIS IS WHAT WAS MISSING
        __blk_end_request_all(req, -ENODEV);
        break;
        //get out - the rest of the  queue will be emptied on the next
        //submit_req
    } else if (!__blk_end_request(req, res, bytes)) {
        req = blk_fetch_request(q); //get the next request
    }
}