Unix & Linux: Extracting values from a file keyed by multiple keys

lundi 12 janvier 2015

Extracting values from a file keyed by multiple keys

Consider a file with key=value pairs, and each key is optionally a concatenation of multiple keys. In other words, many keys can map to one value. The reason behind this is that each key is a relatively short word compared to the length of the value, hence the data is being 'compressed' into lesser lines.

Illustration (i.e. not the real values):


$ cat testfile
A,B,C=a-very-long-value
D,E,F=another-very-long-value
K1,K2,K3=many-many-more
Z=more-long-value

It is valid to assume that all keys are unique, and will not contain the following characters:

key delimiter: ,

key-value delimiter: =

whitespace character:

keys may come in any form in the future (with the above constraints), but if it helps, they currently adhere to the following regex coincidentally: [[:upper:]]{2}[[:upper:]0-9]. Likewise, values will not contain =, so = can be safely used to split each line.

In order to facilitate data extraction from this file, a function getval() is defined as such:


getval() {
    sed -n "/^\([^,]*,\)*$1\(,[^=]*\)*=\(.*\)$/{s//\3/p;q}" testfile
}

As such, calling getval A will return the value a-very-long-value.

Questions:

Is the current definition of getval() robust enough?

Are there alternative ways of performing the data extraction that are possibly shorter/more expressive/more restrictive?

For what it's worth, this script will run with cygwin's bash. Thanks!

Unix & Linux

lundi 12 janvier 2015

Extracting values from a file keyed by multiple keys

Aucun commentaire:

Enregistrer un commentaire