How to generate large sequential numbers list in shell

Posted: July 01, 2021

tl;dr

use bc command, if you can.

https://www.gnu.org/software/bc/manual/html_mono/bc.html

# generate 1M(+1) numbers from 1,000,000 to 2,000,000. 
echo 'for(i=1000000;i<=2000000;i++)i' | bc -q > output.txt

Motivation

I'd like to calculate that how much size can be compressed by zip or other archivers if you have simple (but huge!) numbers list.

I know the easy way to generate it is use MS Excel or Google Spreadsheet.

Since I only have macOS, I thought MS Excel will likely be freezed more than 10M rows.

Spreadsheet can have up to 5M cells.

Then, to generate the numbers, I used seq.

But it outputs logarithmic numbers when I use more than 1 million.

$ seq 1000000 1000005
1e+06
1e+06
1e+06
1e+06
1e+06
1.00000e+06

Then I tried to use -f "%g" option, but it did not affect the output at all.

(Actually, you can use -f "%.0f" option, I found it after a while...)

$ seq -f "%.0f" 1000000 1000005
1000000
1000001
1000002
1000003
1000004
1000005

Then I tried to use bc command, it outputs the I want. Besides that, it runs fast on my macOS!

echo "for(i=1000000;i<=2000000;i++)i" | bc -q > output.txt

Benchmark

I thought bc must be slower than seq , but not (at least on my MacBook)

These are the competitors, each one is to generate 1M+1 sequential numbers started from 1,000,000 to 2,000,000.

#1 use seq
for i in $(seq -f "%.0f" 1000000 1 2000000); do echo $i; done > output_seq.txt

#2 use bc
echo 'for(i=1000000;i<=2000000;i++)i' | bc -q > output_bc.txt

#3 use seq, but try to add a line to file in each loop, must be SLOW!
for i in $(seq -f "%.0f" 1000000 1 2000000); do echo $i >> output_seq2.txt; done

Benchmark Results

#1 use seq
$ time /bin/bash -c 'for i in $(seq -f "%.0f" 1000000 1 2000000); do echo $i; done > output_seq.txt'
2.93s user 1.64s system 98% cpu 4.647 total

#2 use bc
$ time /bin/bash -c 'echo "for(i=1000000;i<=2000000;i++)i" | bc -q > output_bc.txt'
0.57s user 1.58s system 95% cpu 2.259 total

#3 use seq
$ time /bin/bash -c 'for i in $(seq -f "%.0f" 1000000 1 2000000); do echo $i >> output_seq2.txt; done'
7.31s user 36.38s system 91% cpu 47.628 total

# validation
$ diff output_seq.txt output_bc.txt
$ diff output_seq.txt output_seq2.txt

Results of my motivation

Voila!

# make zip file
$ zip output_bc.txt.zip output_bc.txt
adding: output_bc.txt (deflated 73%)

# make bzip2 file
$ tar cvjf output_bc.txt.bz2 output_bc.txt

# compare rough size
$ du -h output_bc.txt*
7.7M	output_bc.txt
1.0M	output_bc.txt.bz2
2.0M	output_bc.txt.zip