lundi 12 janvier 2015

Extracting values from a file keyed by multiple keys


Consider a file with key=value pairs, and each key is optionally a concatenation of multiple keys. In other words, many keys can map to one value. The reason behind this is that each key is a relatively short word compared to the length of the value, hence the data is being 'compressed' into lesser lines.


Illustration (i.e. not the real values):



$ cat testfile
A,B,C=a-very-long-value
D,E,F=another-very-long-value
K1,K2,K3=many-many-more
Z=more-long-value


It is valid to assume that all keys are unique, and will not contain the following characters:



  • key delimiter: ,

  • key-value delimiter: =

  • whitespace character:


keys may come in any form in the future (with the above constraints), but if it helps, they currently adhere to the following regex coincidentally: [[:upper:]]{2}[[:upper:]0-9]. Likewise, values will not contain =, so = can be safely used to split each line.


In order to facilitate data extraction from this file, a function getval() is defined as such:



getval() {
sed -n "/^\([^,]*,\)*$1\(,[^=]*\)*=\(.*\)$/{s//\3/p;q}" testfile
}


As such, calling getval A will return the value a-very-long-value.


Questions:



  • Is the current definition of getval() robust enough?

  • Are there alternative ways of performing the data extraction that are possibly shorter/more expressive/more restrictive?


For what it's worth, this script will run with cygwin's bash. Thanks!



Aucun commentaire:

Enregistrer un commentaire