I have a directory with several hundred csv files whose filenames start with two digits {01..84}. Several hundred >> 84, so obviously some filenames start with the same prefix. I wish to concatenate the files whose filenames start with the same prefix. Here's what I've got:
#!/bin/bash
for i in {01..84}; do
#declare array to store files with same prefix
declare -a files=()
echo "Processing $i"
for j in `ls $i*.csv`; do
#add files with same prefix to array
files=("${files[@]}" "$j")
done
#cat first file including header with the rest of the files without the headers
cat < ${files[@]:0:1} <(tail -n+2 ${files[@]:1}) > "$i".csv
done
So far so good ... only, it grinds to a halt at $i=22 halfway through (repeatable error), and pollutes the output files with blank lines and headers like "==> 19XXX.csv <==" (without quotes).
What should I change in the code to just get a nice clean csv file for each prefix without the script crashing?
Are there any precompiled bash utilities that I can call to do any of this quicker and easier?
Edit: working code solution for anyone who just came here for a copy-paste:
#!/bin/bash
for i in {01..84}; do
#declare array to store files with same prefix
declare -a files=()
echo "Processing $i"
for j in `ls $i*.csv`; do
#add files with same prefix to array
files=("${files[@]}" "$j")
done
#cat first file including header with the rest of the files without the headers
if [ ${#files[@]} -gt 1 ]; then
cat <(cat ${files[@]:0:1}) <(tail -q -n+2 ${files[@]:1}) > "$i".csv
else
cat <(cat ${files[@]:0:1}) > "$i".csv
fi
done
Edit 2: Stéphane Chazelas' way. Much cleaner.
#!/bin/bash
for i in {01..84}; do
echo "processing $i"
awk 'NR==FNR||FNR>1' $i?*.csv >> "$i".csv
done
Aucun commentaire:
Enregistrer un commentaire