mercredi 14 janvier 2015

remove a portion of a line contained within two different types of separator / delimiters


Most of the commandline tools I'm looking at have the ability to pick a field delimiter. However, I'd like to pick one delimiter to start, and a different one to end the segment of text I'd like to remove from each line I'm processing.



1text [blah blah blah] text number punctuation text text
2text text text
3text text (text) [blah blah blah] number text
4text <url> <email> text [blah blah blah] text


I'd like to remove all the 'blah blah blah' from those lines.


Blah can contain anything, except newlines, EOFs, and other breaky-things, and '['. ie: I shouldn't have '[[' (nor '[blah[') in any of the data


I only have one (optional) instance of [] per line. So, for line 2 there is nothing to remove, and this shouldn't cause a halt, stop or failure.


I'm almost 100% positive that if I've got a start '[' I also have a ']'. That might be nice to check for, however.


There are other forms of punctuation, so I don't want to work it with something that just looks for non-alphanumeric stuff to start removing (ie: line 4)


Bonus points for being able to figure out if I'm putting together two (now adjacent) whitespaces at that particular point - but without removing double whitespaces at any other point.


I'm pretty sure I'll have to use awk or sed, but if there were a way to do this via regular commandline tools, to make it as portable as possible, that would be ideal.


Also, explaining what you're doing (if you're using regex / sed) would certainly help, as:




A suggestion http://ift.tt/1zcyKSw">here says:



sed 's/^.%([^ ]) .\$([^$])$/\1 \2/' infile



I got that kinda working with this bit of monkeying:



cat data | sed 's/^.[([^ ]) .]([^$])$/\1 \2/'



(hmm, it's not showing some of those wildcards and things, and I'm not sure of SE's markup enough to fix it)


However it doesn't take out the whole swath of 'blah blah blah', and leaves with an extra line-break.




Using cut/awk/sed with two different delimiters


Doesn't really answer the question in a general sense (or, at least I wasn't able to figure something out after reading it - maybe just a fail on my part), but seems to be (too) specifically tailored to that person's data.



Aucun commentaire:

Enregistrer un commentaire