jeudi 29 janvier 2015

split line based on space and delete the second part


I have a big file



>fid|29290408|locus|VBIEntCas2262_0001| Phosphoglycolate phosphatase (EC 3.1.3.18) [Enterococcus casseliflavus EC20]
gtgagaaagaaagtactttttgatttagatggaacgatcattgattcgagtgaaggaatc
tatggatcgattcaatatgcgatggaaaaaatgggaaaagagcaattagcgcaagacgta
ctgcggagctttgtggggccgcctttgattgaatccttccgtggcttgggcttcgatgaa
>fid|29290410|locus|VBIEntCas2262_0002| hypothetical protein [Enterococcus casseliflavus EC20]
atgatcggcgaacgttttttgatcacaccgatcgacgaaccgttagacccatacaatgag
ttagtctcaagcaatcagtttactttctttacatcaacctatgatcaaatgttcttgact
ggtcatctgattctagatgttcacccaacttcaggaactttgattttgaaaaacgaaagc
ggctatttggataccaatcttttattggaatcctctccacagttaaaacaaacgaatgcg
>fid|29290414|locus|VBIEntCas2262_0004| FIG00630550: hypothetical protein [Enterococcus casseliflavus EC20]
atgaagcgtgttgcagaaaactatttggttgttttttcgattcttttgctgattatatgg
ctaggcttgatccaagtgaaagaatattcgcaagaagtagccctgtcgatcatttacttt


I need to split each line beginning with ">" based on the space, retaining in the new file only the part before the spaces, with the following lines.


So the file I need should be:



>fid|29290408|locus|VBIEntCas2262_0001|
gtgagaaagaaagtactttttgatttagatggaacgatcattgattcgagtgaaggaatc
tatggatcgattcaatatgcgatggaaaaaatgggaaaagagcaattagcgcaagacgta
ctgcggagctttgtggggccgcctttgattgaatccttccgtggcttgggcttcgatgaa


an so on.


the number of lines following the header (starting with >) is not fixed.


How could I do?



Aucun commentaire:

Enregistrer un commentaire