我正在开发一个查询处理器,它从内存中读取文档 ID 的长列表并查找匹配的 ID。当它找到一个时,它会创建一个包含 docid(int)和文档等级(double)的 DOC 结构,并将其推送到优先级队列。我的问题是,当搜索的单词有很长的列表时,当我尝试将 DOC 推送到队列时,出现以下异常:QueryProcessor.exe 中 0x7c812afb 处未处理的异常:微软C++异常:std::bad_alloc 位于内存位置 0x0012ee88..

当单词有一个简短的列表时,效果很好。我尝试在代码中的多个位置将 DOC 推送到队列中,它们都可以工作,直到某一行;之后,我收到上述错误。我完全不知道出了什么问题,因为读入的最长列表小于 1 MB,并且我释放了分配的所有内存。当我尝试将 DOC 推送到有容量容纳它的队列时(我使用了一个保留了足够空间的向量作为优先级队列的底层数据结构),为什么会突然出现 bad_alloc 异常?

我知道在没有看到所有代码的情况下几乎不可能回答这样的问题,但在这里发布太长了。我尽我所能地投入,并焦急地希望有人能给我答案,因为我束手无策。

NextGEQ 函数逐块读取 docids 的压缩块列表。也就是说,如果它发现块中的lastdocid(在单独的列表中)大于传入的docid,它将解压缩该块并搜索,直到找到正确的块。每个列表都以有关该列表的元数据开始,其中包含每个压缩块的长度以及该块中的最后一个 docid。data.iquery 指向元数据的开头;data.metapointer 指向该函数当前在元数据中的任何位置;data.blockpointer 指向未压缩 docids 块的开头(如果有的话)。如果它发现它已经解压,它就会进行搜索。下面,当我第一次调用该函数时,它解压缩一个块并找到 docid;之后推送到队列中。第二次甚至不需要解压;也就是说,没有分配新的内存,但在那之后,推送到队列会出现 bad_alloc 错误。

编辑:我进一步清理了我的代码,以便它可以编译。我还添加了 OpenList() 和 NextGEQ 函数,尽管后者很长,因为我认为问题是由其中某处的堆损坏引起的。多谢!

struct DOC{

    long int docid;
    long double rank;

public:
    DOC()
    {
        docid = 0;
        rank = 0.0;
    }

    DOC(int num, double ranking)
    {
        docid = num;
        rank = ranking;

    }

     bool operator>( const DOC & d ) const {
       return rank > d.rank;
    }

      bool operator<( const DOC & d ) const {
       return rank < d.rank;
    }
    };

struct listnode{

int* metapointer;
int* blockpointer;
int docposition;
int frequency;
int numberdocs;
int* iquery;
listnode* nextnode;

};

void QUERYMANAGER::SubmitQuery(char *query){

    listnode* startlist;

        vector<DOC> docvec;
        docvec.reserve(20);
        DOC doct;


    //create a priority queue to use as a min-heap to store the documents and rankings;


        priority_queue<DOC, vector<DOC>,std::greater<DOC>> q(docvec.begin(), docvec.end());

        q.push(doct);

    //do some processing here; startlist is a pointer to a listnode struct that starts the   //linked list

        //point the linked list start pointer to the node returned by the OpenList method

        startlist = &OpenList(value);
        listnode* minpointer;
        q.push(doct);


        //start by finding the first docid in the shortest list
            int i = 0;
            q.push(doct);
            num = NextGEQ(0, *startlist);
            q.push(doct);
            while(num != -1)
               {

            q.push(doct);

    //the is where the problem starts - every previous q.push(doct) works; the one after
    //NextGEQ(num +1, *startlist) gives the bad_alloc error

            num = NextGEQ(num + 1, *startlist);

         //this is where the exception is thrown
            q.push(doct);               
        }

    }



//takes a word and returns a listnode struct with a pointer to the beginning of the list
//and metadata about the list 
listnode QUERYMANAGER::OpenList(char* word)
{
    long int numdocs;

    //create a new node in the linked list and initialize its variables


    listnode n;
    n.iquery = cache -> GetiList(word, &numdocs);
    n.docposition = 0;
    n.frequency = 0;
    n.numberdocs = numdocs;

   //an int pointer to point to where in the metadata you are
    n.metapointer = n.iquery;
    n.nextnode = NULL;
  //an int pointer to point to the uncompressed block of data, if there is one
    n.blockpointer = NULL;



    return n;


}


int QUERYMANAGER::NextGEQ(int value, listnode& data)
{
     int lengthdocids;
     int lengthfreqs; 
     int lengthpos;
     int* temp;
     int lastdocid;


     lastdocid = *(data.metapointer + 2);

while(true)
{

         //if it's not the first chunk in the list, the blockpointer will be pointing to the 
        //most recently opened block and docpos to the current position in the block
    if( data.blockpointer && lastdocid >= value)
    {

            //if the last docid in the chunk is >= the docid we're looking for,
            //go through the chunk to look for a match


        //the last docid in the block is in lastdocid; keep going until you hit it
        while(*(data.blockpointer + data.docposition) <= lastdocid)
        {
            //compare each docid with the docid passed in; if it's greater than or equal to it, return a pointer to the docid
             if(*(data.blockpointer + data.docposition ) >= value)
             {

                 //return the next greater than or equal docid
                 return *(data.blockpointer + data.docposition);
             }
             else
             {
                 ++data.docposition;
             }
        }

        //read through the whole block; couldn't find matching docid; increment metapointer to the next block;
        //free the block's memory

        data.metapointer += 3;
        lastdocid = *(data.metapointer + 3);
        free(data.blockpointer);
        data.blockpointer = NULL;
    }


        //reached the end of a block; check the metadata to find where the next block begins and ends and whether 
        //the last docid in the block is smaller or larger than the value being searched for


        //first make sure that you haven't reached the end of the list
            //if the last docid in the chunk is still smaller than the value passed in, move the metadata pointer
           //to the beginning of the next chunk's metadata; read in the new metadata


            while(true)
         //  while(*(metapointers[index]) != 0 )
           {
               if(lastdocid < value && *(data.metapointer) !=0)
               {
               data.metapointer += 3;
               lastdocid = *(data.metapointer + 2);
               }


           else if(*(data.metapointer) == 0)
           {
               return -1;
             }

           else
               //we must have hit a chunk whose lastdocid is >= value; read it in
           {
                //read in the metadata
           //the length of the chunk of docid's is cumulative, so subtract the end of the last chunk 
           //from the end of this chunk to get the length

               //find the end of the metadata


                temp = data.metapointer;

            while(*temp != 0)
            {
                temp += 3;
            }
                temp += 2;
    //temp is now pointing to the beginning of the list of compressed data; use the location of metapointer
    //to calculate where to start reading and how much to read

         //if it's the first chunk in the list,the corresponding metapointer is pointing to the beginning of the query
        //so the number of bytes of docid's is just the first integer in the metadata
                if(  data.metapointer == data.iquery)
                {
                    lengthdocids = *data.metapointer;

                }

                else
                {
                    //start reading from the offset of the end of the last chunk (saved in metapointers[index] - 3)
                    //plus 1 = the beginning of this chunk

                    lengthdocids = *(data.metapointer) - (*(data.metapointer - 3));
                    temp += (*(data.metapointer - 3)) / sizeof(int); 

                   }


           //allocate memory for an array of integers - the block of docid's uncompressed
           int* docblock = (int*)malloc(lengthdocids * 5 );

           //decompress docid's into the block of memory allocated
            s9decompress((int*)temp, lengthdocids /4, (int*) docblock, true);

            //set the blockpointer to point to the beginning of the block
            //and docpositions[index] to 0
            data.blockpointer = docblock;
            data.docposition = 0;
            break;

                }

           } 

}
}

非常感谢你,bsg。

有帮助吗?

解决方案

假设你有堆损坏和实际上不是耗尽存储器,堆可以遭到损坏时,最常见的方法是通过删除(或释放)相同的指针两次。你可以很容易地找到了,如果这是通过简单地注释掉所有呼叫删除(或免费)的问题。这将导致你的程序泄漏像筛子,但是如果它实际上并没有崩溃,你可能已经发现了问题。

一个腐败的堆的另一常见原因原因是删除(或释放)未曾经在堆上分配的指针。腐败的两种原因之间的区别并不总是很容易,但是你的首要任务应该是找出是否损坏是问题。

请注意此方法,如果要删除的东西都如不叫破你的程序的语义析构函数将无法正常使用。

其他提示

QUERYMANAGER::OpenList 按值返回列表节点。在 startlist = &OpenList(value); 然后,您继续获取返回的临时对象的地址。当临时数据消失后,您可能可以访问数据一段时间,然后数据就会被覆盖。您可以在堆栈上声明一个非指针listnode startlist并直接为其分配返回值吗?然后删除其他用途前面的 * 并查看是否可以解决问题。

你可以尝试的另一件事是用智能指针代替所有的指针,特别是像boost::shared_ptr<>,取决于有多少代码这真的是,有多少你满意自动化任务。智能指针不是一切问题的答案,但他们至少比原始指针更安全。

感谢您的帮助。你是对的,尼尔 - 我必须设法败坏我的堆。我真不知道是什么导致它,但是当我改变了的malloc(numdocids * 5)对malloc(256),它奇迹般地停止崩溃。我想我应该先检查了我的mallocs与否实际上是成功的!再次感谢! BSG

许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top