lundi 12 janvier 2015

How to concatenate a variable number of csv, removing their header rows?


I have a directory with several hundred csv files whose filenames start with two digits {01..84}. Several hundred >> 84, so obviously some filenames start with the same prefix. I wish to concatenate the files whose filenames start with the same prefix. Here's what I've got:



#!/bin/bash
for i in {01..84}; do
#declare array to store files with same prefix
declare -a files=()
echo "Processing $i"
for j in `ls $i*.csv`; do
#add files with same prefix to array
files=("${files[@]}" "$j")
done
#cat first file including header with the rest of the files without the headers
cat < ${files[@]:0:1} <(tail -n+2 ${files[@]:1}) > "$i".csv
done


So far so good ... only, it grinds to a halt at $i=22 halfway through (repeatable error), and pollutes the output files with blank lines and headers like "==> 19XXX.csv <==" (without quotes).




  1. What should I change in the code to just get a nice clean csv file for each prefix without the script crashing?




  2. Are there any precompiled bash utilities that I can call to do any of this quicker and easier?




Edit: working code solution for anyone who just came here for a copy-paste:



#!/bin/bash
for i in {01..84}; do
#declare array to store files with same prefix
declare -a files=()
echo "Processing $i"
for j in `ls $i*.csv`; do
#add files with same prefix to array
files=("${files[@]}" "$j")
done
#cat first file including header with the rest of the files without the headers
if [ ${#files[@]} -gt 1 ]; then
cat <(cat ${files[@]:0:1}) <(tail -q -n+2 ${files[@]:1}) > "$i".csv
else
cat <(cat ${files[@]:0:1}) > "$i".csv
fi
done


Edit 2: Stéphane Chazelas' way. Much cleaner.



#!/bin/bash
for i in {01..84}; do
echo "processing $i"
awk 'NR==FNR||FNR>1' $i?*.csv >> "$i".csv
done


Aucun commentaire:

Enregistrer un commentaire