Unix & Linux: How to concatenate a variable number of csv, removing their header rows?

lundi 12 janvier 2015

How to concatenate a variable number of csv, removing their header rows?

I have a directory with several hundred csv files whose filenames start with two digits {01..84}. Several hundred >> 84, so obviously some filenames start with the same prefix. I wish to concatenate the files whose filenames start with the same prefix. Here's what I've got:


#!/bin/bash
for i in {01..84}; do
        #declare array to store files with same prefix
        declare -a files=()
        echo "Processing $i"
        for j in `ls $i*.csv`; do
                #add files with same prefix to array
                files=("${files[@]}" "$j")
        done    
        #cat first file including header with the rest of the files without the headers 
        cat < ${files[@]:0:1} <(tail -n+2 ${files[@]:1}) > "$i".csv
done

So far so good ... only, it grinds to a halt at $i=22 halfway through (repeatable error), and pollutes the output files with blank lines and headers like "==> 19XXX.csv <==" (without quotes).

What should I change in the code to just get a nice clean csv file for each prefix without the script crashing?

Are there any precompiled bash utilities that I can call to do any of this quicker and easier?

Edit: working code solution for anyone who just came here for a copy-paste:


#!/bin/bash
for i in {01..84}; do
    #declare array to store files with same prefix
    declare -a files=()
    echo "Processing $i"
    for j in `ls $i*.csv`; do
        #add files with same prefix to array
        files=("${files[@]}" "$j")
    done
    #cat first file including header with the rest of the files without the headers
    if [ ${#files[@]} -gt 1 ]; then
        cat <(cat ${files[@]:0:1}) <(tail -q -n+2 ${files[@]:1}) > "$i".csv
    else
        cat <(cat ${files[@]:0:1}) > "$i".csv
    fi
done

Edit 2: Stéphane Chazelas' way. Much cleaner.


#!/bin/bash
for i in {01..84}; do
        echo "processing $i"
        awk 'NR==FNR||FNR>1' $i?*.csv >> "$i".csv
done

Unix & Linux

lundi 12 janvier 2015

How to concatenate a variable number of csv, removing their header rows?

Aucun commentaire:

Enregistrer un commentaire