Parsing A Genbank File Format With Biopython's Seqio
I'm trying to parse a protein genbank file format, Here's an example file (example.protein.gpff) LOCUS NP_001346895 208 aa linear PRI 20-JAN-2018 DEF
Solution 1:
Check out the Genebank-parser library. It accepts a genebank filename and the batch size; next_batch yields as many number of records as batch_size specifies.
Solution 2:
Seems like the easiest way to deal with this file format is to convert it to a JSON format (for example, using Bio), and then read it with various JSON parsers (like the rjson package in R, which parses a JSON file to a list of records)
Post a Comment for "Parsing A Genbank File Format With Biopython's Seqio"