Please guide with the following data formatting.
I have Readings for Names, from Machines and sometimes these readings are replicated. Where a reading was not found, it is left as blank.
Name Instrument Rep R1 R2 R3
N1 I1 1 1 2 3
N2 I1 1 1 3 4
N1 I1 2 2 3 4
N3 I1 2 3 4 5
N1 I2 1 1 2 3
N2 I2 1 1 3 4
N2 I2 2 2 3 4
N3 I2 1 3 4 5
N1 I3 1 1 4
N2 I3 1 2 5
N3 I3 1 6
N3 I3 2 1
First, I want to consolidate the replicates by using their mean (per name per location). Then, I want to transpose this data and replace the missing values by dot (.) .
What I want in my output is
Reading Instrument N1 N2 N3
R1 I1 1.5 1 3
R2 I1 2.5 3 4
R3 I1 3.5 4 5
R1 I2 1 1.5 3
R2 I2 2 3 4
R3 I2 3 4 5
R1 I3 1 2 .
R2 I3 . 5 6
R3 I3 4 . 1
Please note that the number of Names and Readings are quite variable, in some files I have 134 readings, some others have 28 etc. but the readings always start from col3.
This is what I tried unsuccessfully for a test run for only one column
awk '
NR>1{
arr[$1" "$2" "$3] += $4
count[$1" "$2" "$3] += 1
}
END{
for (a in arr) {
print a, arr[a] / count[a]
}
}
' file | awk '
NR == 1 {
n = NF
for (i = 1; i <= NF; i++)
row[i] = $i
next
}
{
if (NF > n)
n = NF
for (i = 1; i <= NF; i++)
row[i] = row[i] " " $i
}
END {
for (i = 1; i <= n; i++)
print row[i]
}'
 
Aucun commentaire:
Enregistrer un commentaire