I only have little experience in programming, and I'm currently working on improving my skills. Basically, I need to write a program, that can do some specific processes to some data in a .txt file.
To start from scratch, I have a .txt file with data looking like this:
>tex_1 abcdefghijklmnopqrstu
>tex_2 abcdefghijklmnopqrstuv
>tex_3 abcdefghijklmnopqrstuv
>tex_4 abcdefghijklmnopqrst
// x
>tex_1 abcdefghijklmnopqrstu
>tex_2 abcdefghijklmnopqrstuv
>tex_3 abcdefghijklmnopqrst
>tex_4 abcdefghijklmnopqrstuv
// x x
>tex_1 abcdefghijklmnopqrstuv
>tex_2 abcdefghijklmnopqrstuv
// x x
I need to do some weird stuff to this data, in order to end up with a data set, that can be analyzed in the software I use. Each "//..."-line refers to the group of data above, until the next "//..."-line
Here's a line-up of what I want to do:
Shift the "//..."-line, so the group of data it refers to is below this line, and not above it:
// x
>tex_1 abcdefghijklmnopqrstu
>tex_2 abcdefghijklmnopqrstuv
>tex_3 abcdefghijklmnopqrstuv
>tex_4 abcdefghijklmnopqrst
// x x
>tex_1 abcdefghijklmnopqrstu
>tex_2 abcdefghijklmnopqrstuv
>tex_3 abcdefghijklmnopqrst
>tex_4 abcdefghijklmnopqrstuv
// x x
>tex_1 abcdefghijklmnopqrstuv
>tex_2 abcdefghijklmnopqrstuv
Add a unique name to each group after //, without shifting remaining text on the line:
//Name 1 x
>tex_1 abcdefghijklmnopqrstu
>tex_2 abcdefghijklmnopqrstuv
>tex_3 abcdefghijklmnopqrstuv
>tex_4 abcdefghijklmnopqrst
//Name 2 x x
>tex_1 abcdefghijklmnopqrstu
>tex_2 abcdefghijklmnopqrstuv
>tex_3 abcdefghijklmnopqrst
>tex_4 abcdefghijklmnopqrstuv
//Name 3 x x
>tex_1 abcdefghijklmnopqrstuv
>tex_2 abcdefghijklmnopqrstuv
Output this to a new file, without changing the original. Then grab each name-line + line below, and output this to a File2:
//Name 1 x
>tex_1 abcdefghijklmnopqrstu
//Name 2 x x
>tex_1 abcdefghijklmnopqrstu
//Name 3 x x
>tex_1 abcdefghijklmnopqrstuv
Change the structure, so the naming is like the following, and output this to File3:
>Name 1 abcdefghijklmnopqrstu
>Name 2 abcdefghijklmnopqrstu
>Name 3 abcdefghijklmnopqrstuv
The above data is in a structure, I can actually analyse.
Now I know this is guite the task (especially for a complete programming-noob), and I am not asking you "how do I program this. I would just like to know, where you guys would start with such a project, and what language do you think fits the project best?
I managed to do a few things in unix, by getting help on this site. E.g. Giving unique names to each "//..." line, by the following unix code:
awk -F '' '/\/\//{n++ ; t=" Name "n ; sub("// {0,"length(t)-1"}","//"t)}{print}' File1.txt
Could you give me some hints for where to start? Is the problem suitable as a Python project? The original .txt data file contains a lot of data, so it is not possible to do the processing by hand. Also this project is meant as a way to get further into programming.
Thank you!
Aucun commentaire:
Enregistrer un commentaire