I have a big file
>fid|29290408|locus|VBIEntCas2262_0001| Phosphoglycolate phosphatase (EC 3.1.3.18) [Enterococcus casseliflavus EC20]
gtgagaaagaaagtactttttgatttagatggaacgatcattgattcgagtgaaggaatc
tatggatcgattcaatatgcgatggaaaaaatgggaaaagagcaattagcgcaagacgta
ctgcggagctttgtggggccgcctttgattgaatccttccgtggcttgggcttcgatgaa
>fid|29290410|locus|VBIEntCas2262_0002| hypothetical protein [Enterococcus casseliflavus EC20]
atgatcggcgaacgttttttgatcacaccgatcgacgaaccgttagacccatacaatgag
ttagtctcaagcaatcagtttactttctttacatcaacctatgatcaaatgttcttgact
ggtcatctgattctagatgttcacccaacttcaggaactttgattttgaaaaacgaaagc
ggctatttggataccaatcttttattggaatcctctccacagttaaaacaaacgaatgcg
>fid|29290414|locus|VBIEntCas2262_0004| FIG00630550: hypothetical protein [Enterococcus casseliflavus EC20]
atgaagcgtgttgcagaaaactatttggttgttttttcgattcttttgctgattatatgg
ctaggcttgatccaagtgaaagaatattcgcaagaagtagccctgtcgatcatttacttt
I need to split each line beginning with ">" based on the space, retaining in the new file only the part before the spaces, with the following lines.
So the file I need should be:
>fid|29290408|locus|VBIEntCas2262_0001|
gtgagaaagaaagtactttttgatttagatggaacgatcattgattcgagtgaaggaatc
tatggatcgattcaatatgcgatggaaaaaatgggaaaagagcaattagcgcaagacgta
ctgcggagctttgtggggccgcctttgattgaatccttccgtggcttgggcttcgatgaa
an so on.
the number of lines following the header (starting with >) is not fixed.
How could I do?
Aucun commentaire:
Enregistrer un commentaire