Some success with se, but need help finding blank line

software development

#1

A colleague was faced with a problem of 804 ASCII files and he needed to extract the tabular data out of the first part and the last part into Excel spread sheets. He couldn’t figure out a way using Windows tools to do this. I told him he needed Linux tools and sed was his answer. Since I could see that the useful data in the first part of the file was always in lines 2-44, I was able to get the the data into new files using a shell script like:
[color=#1E90FF][font=Courier]for file in *.lst
do
sed ‘2,44!d’ $file > …/Listout/$file
done[/font][/color]

And the last part of the file by keying into the string ‘Clock’:
[color=#1E90FF][font=Courier]for file in *.tab
do
sed ‘/Clock/,$!d’ $file > ./Tableout/$file
done[/font][/color]

But some of these files have additional useful data in the middle of the file at the line that starts with ‘blank char Record’ under the line that starts with ‘Supplemental’. There are a different number of lines of data that follow the header line, which makes it tricky. The data looks like this in this area of the files:
[color=#C71585][font=Courier][size=small]Supplemental_Data
Record Date Time Location(ft) Gauge_Height(ft) Rated_Flow(cfs) Comments
01 2011/02/09 09:16:46 0.000 () 86.5050

St Clock Loc Depth IceD %Dep MeasD Npts Spike Vel SNR Angle Verr Bnd Temp CorrFact MeanV Area Flow %Q
() () (ft) (ft) (ft) (*D) (ft) () () (ft/s) (dB) (deg) (ft/s) () (degF) () (ft/s) (ft^2) (cfs) (%)
00 09:17 6.70 0.000 0.000 0.0 0.000 0 0 0.0000 0.0 0 0.0000 0 0.00 1.00 0.0000 0.000 0.0000 0.0
[/size][/font][/color]

I thought I could get sed to not delete the first blank line in the file, but this approach didn’t work for me:
[color=#1E90FF][font=Courier]for file in *.tab
do
sed ‘/Record/,/ /!d’ $file > …/sup/$file.sup
done[/font][/color]

In the result, I got the lines from ‘Record’ to the end of file’.

It would be extra tricky to use grep first to find only the files that contained this string ‘Supplemental’ to pipe to sed.

Or is there a better approach to this than sed in a shell script?


#2

I just tried detecting the blank line with
for file in .tab
do
sed '/Record/,/^[ ]
$/!d’ $file > …/sup/$file.sup
done

but that gave me the same result, lines from Record to the end of the file. The blank line isn’t exactly blank. It has a ^M only in the line. Can’t clue into ^M because every line has that on the end of line.


#3

This is what I finally used:

for file in *.tab
do

use grep to find only the files containing “Supplemental” string

if grep -q ‘^Supplemental’ $file ; then
sed ‘s/[[:space:]]+$//’ $file |
sed ‘/^ Record/,/^$/!d’ > …/sup/$file.sup
fi
done

It seemed to be necessary to use the first sed line to convert DOS to UNIX, otherwise /^$/ in the second sed line wouldn’t find a blank line, it would see ^M in the blank line. So before handing the files back to my DOS user, I ran them through another sed shell program to convert them to DOS txt files:

for file in *.tab.sup
do
sed ‘s/$/\r/’ $file > …/dos/$file.txt
done


#4

Easier solution to the DOS/UNIX issue:

Or unix2dos, as appropriate. It’ll convert the file in place.


#5

Thanks, I’ve used dos2unix before, on Solaris systems, but it doesn’t appear to be available in my installation of Ubuntu 10.4. Is that normal? Didn’t know how to get it and install it.

It is available on my Windows machine with Unix Services for Windows installed.


#6

It’s in the ‘tofrodos’ package — “sudo apt-get install tofrodos” should install it.


#7

Thanks, Andrew. I read up on this and found that fromdos and todos are already installed on this Ubuntu system.