Question

I have known about this library from Heng Li for a while, but I have not attempted to use it until now, mostly because up until now python was fast enough for me.

Here is the link to the header: http://lh3lh3.users.sourceforge.net/kseq.shtml

When I attempt to use the following to parse a fasta file, it returns -1 for the length for the sequence line. I have looked over the Li's code and this seems to be designed mainly for FASTQ parsing, but he does say on his webpage that it also supports the FASTA format.

Here is my code:

#include <stdio.h>
#include <stdlib.h>
#include "kseq.h"  
// STEP 1: declare the type of file handler and the read() function  
KSEQ_INIT(FILE*, read)


int main(int argc, char** argv) {
    FILE* fp = fopen(argv[1], "r"); // STEP 2: open the file handler
    kseq_t *seq = kseq_init(fp); // STEP 3: initialize seq 

    int l;

    while ((l = kseq_read(seq)) >= 0) { // STEP 4: read sequence  
        printf("name: %s\n", seq->name.s);  
        if (seq->comment.l) printf("comment: %s\n", seq->comment.s);  
        printf("seq: %s\n", seq->seq.s);  
        if (seq->qual.l) printf("qual: %s\n", seq->qual.s);  
    }

    printf("return value: %d\n", l);  
    kseq_destroy(seq); // STEP 5: destroy seq
    fclose(fp);

    return (0);
}

The FASTA I have been using to test with is the Hg19 GRCH37 ChrY.fa file available from multiple sources including the Broad Institute.

Any help would be appreciated.

Was it helpful?

Solution

First you should check the return value of fopen():

FILE* fp = fopen(argv[1], "r"); // STEP 2: open the file handler
if(fp == 0) {
    perror("fopen");
    exit(1);
}

Second, I looked at the header file and I think kseg_init takes an fd not a FILE *. You can get an fd from a FILE * with fileno().

kseq_t *seq = kseq_init(fp); // STEP 3: initialize seq 

Should be:

kseq_t *seq = kseq_init(fileno(fp)); // STEP 3: initialize seq 

OTHER TIPS

Here is the complete code that is working for me

#include <zlib.h>
#include <stdio.h>
#include "kseq.h"
KSEQ_INIT(int, read)

int main(int argc, char **argv)
{
        FILE* fp;
        kseq_t *seq;
        int n = 0, slen = 0, qlen = 0;
        fp = fopen(argv[1], "r");
        seq = kseq_init(fileno(fp));
        while (kseq_read(seq) >= 0)
                ++n ;//slen += seq->seq.l, qlen += seq->qual.l;
        printf("%d\t%d\t%d\n", n, slen, qlen);
        kseq_destroy(seq);
        fclose(fp);
        return 0;
}
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top