vendredi 30 janvier 2015

Writing a program for editing .txt data - Python or Unix?


I only have little experience in programming, and I'm currently working on improving my skills. Basically, I need to write a program, that can do some specific processes to some data in a .txt file.


To start from scratch, I have a .txt file with data looking like this:



>tex_1 abcdefghijklmnopqrstu
>tex_2 abcdefghijklmnopqrstuv
>tex_3 abcdefghijklmnopqrstuv
>tex_4 abcdefghijklmnopqrst
// x
>tex_1 abcdefghijklmnopqrstu
>tex_2 abcdefghijklmnopqrstuv
>tex_3 abcdefghijklmnopqrst
>tex_4 abcdefghijklmnopqrstuv
// x x
>tex_1 abcdefghijklmnopqrstuv
>tex_2 abcdefghijklmnopqrstuv
// x x


I need to do some weird stuff to this data, in order to end up with a data set, that can be analyzed in the software I use. Each "//..."-line refers to the group of data above, until the next "//..."-line


Here's a line-up of what I want to do:


Shift the "//..."-line, so the group of data it refers to is below this line, and not above it:



// x
>tex_1 abcdefghijklmnopqrstu
>tex_2 abcdefghijklmnopqrstuv
>tex_3 abcdefghijklmnopqrstuv
>tex_4 abcdefghijklmnopqrst
// x x
>tex_1 abcdefghijklmnopqrstu
>tex_2 abcdefghijklmnopqrstuv
>tex_3 abcdefghijklmnopqrst
>tex_4 abcdefghijklmnopqrstuv
// x x
>tex_1 abcdefghijklmnopqrstuv
>tex_2 abcdefghijklmnopqrstuv


Add a unique name to each group after //, without shifting remaining text on the line:



//Name 1 x
>tex_1 abcdefghijklmnopqrstu
>tex_2 abcdefghijklmnopqrstuv
>tex_3 abcdefghijklmnopqrstuv
>tex_4 abcdefghijklmnopqrst
//Name 2 x x
>tex_1 abcdefghijklmnopqrstu
>tex_2 abcdefghijklmnopqrstuv
>tex_3 abcdefghijklmnopqrst
>tex_4 abcdefghijklmnopqrstuv
//Name 3 x x
>tex_1 abcdefghijklmnopqrstuv
>tex_2 abcdefghijklmnopqrstuv


Output this to a new file, without changing the original. Then grab each name-line + line below, and output this to a File2:



//Name 1 x
>tex_1 abcdefghijklmnopqrstu
//Name 2 x x
>tex_1 abcdefghijklmnopqrstu
//Name 3 x x
>tex_1 abcdefghijklmnopqrstuv


Change the structure, so the naming is like the following, and output this to File3:



>Name 1 abcdefghijklmnopqrstu
>Name 2 abcdefghijklmnopqrstu
>Name 3 abcdefghijklmnopqrstuv


The above data is in a structure, I can actually analyse.


Now I know this is guite the task (especially for a complete programming-noob), and I am not asking you "how do I program this. I would just like to know, where you guys would start with such a project, and what language do you think fits the project best?


I managed to do a few things in unix, by getting help on this site. E.g. Giving unique names to each "//..." line, by the following unix code:



awk -F '' '/\/\//{n++ ; t=" Name "n ; sub("// {0,"length(t)-1"}","//"t)}{print}' File1.txt


Could you give me some hints for where to start? Is the problem suitable as a Python project? The original .txt data file contains a lot of data, so it is not possible to do the processing by hand. Also this project is meant as a way to get further into programming.


Thank you!



Aucun commentaire:

Enregistrer un commentaire