Ruby's ARGF

Calle Erlandsson

Many Unix utilities accept input both in the form of filenames passed as command-line arguments and as data sent to the program’s standard input stream. If filenames are passed, the corresponding files will be read in sequence. If not, the standard input stream will be read instead. This behavior makes utilities like cat, grep, and sed versatile and easy to use.

In Ruby, a subset of cat‘s features can be re-implemented with the following code:

# cat.rb

if ARGV.length > 0
  ARGV.each do |filename|
    puts File.read(filename)
  end
else
  puts STDIN.read
end

The implementation inspects the length of the ARGV array, containing all command line arguments passed to the program. If any arguments are passed, they are interpreted as filenames, read and output. If no arguments are passed, the standard input stream is instead read and output.

The cat clone can then be used like this:

$ ruby cat.rb file1 file2
Contents of file1
Contents of file2
$ echo "Contents of standard input" | ruby cat.rb
Contents of standard input

It does what it’s supposed to do, but the implementation is very concerned with where its input is coming from. It also duplicates the output functionality in both branches of the conditional. To solve both of these problems, Ruby provides the ARGF stream.

Using the ARGF stream, the cat clone can be re-implemented like so:

# argf.rb

puts ARGF.read

This implementation is oblivious to where its input is coming from and can instead focus on what to do with it.

So what is the ARGF stream? The Ruby standard library documentation describes it as such:

ARGF is a stream designed for use in scripts that process files given as command-line arguments or passed in via STDIN.

ARGF will interpret all elements of the ARGV array as filenames and when read will produce a concatenation of the contents of these files. If ARGV is empty, then ARGF reads from standard input.

This means that if a program also accepts flags like --color or --line-buffered, these flags will have to be shifted off the ARGV array before ARGF is read in order to avoid unexpected “No such file or directory” errors.

Filenames that are manually added to the ARGV array will also be read by ARGF.

After a file has been read using ARGF, its filename is automatically shifted off the ARGV array.

Many Unix utilities, like cat, also support another helpful feature that allows input to be sent both to the standard input stream and as filenames passed as command-line arguments. This is done by passing the special filename - as a command-line argument:

$ echo "Contents of standard input" | cat file1 - file2
Contents of file1
Contents of standard input
Contents of file2

Luckily, ARGF supports this as well:

$ echo "Contents of standard input" | ruby argf.rb file1 - file2
Contents of file1
Contents of standard input
Contents of file2

In addition to exposing an IO-like interface for reading the contents of multiple files and streams, ARGF also provides handy methods for controlling which file or stream is currently read.

To get the file that is currently being read, the #file method can be used. If STDIN is currently read, this method will return an IO object instead of a File:

# file.rb

p ARGF.file
ARGF.read(ARGF.file.size + 1) # The extra byte read is EOF
p ARGF.file
ARGF.read(ARGF.file.size + 1)
p ARGF.file
$ echo "Contents of standard input" | ruby file.rb file1 file2 -
#<File:file1>
#<File:file2>
#<IO:<STDIN>>

To only get the name of the file currently being read, we can use the #filename method.

If our program only processes partial files, for example the YAML front matter of blog posts written in markdown format, the #close method can be used to close the current file and skip to the next file:

# front_matter.rb

ARGF.each_line do |line|
  if ARGF.lineno > 1 && line == "---\n"
    ARGF.close
  end
  puts line
end
$ ruby front_matter.rb post1.md post2.md
---
title: My First Blog Post
---
---
title: My Second Blog Post
---

ARGF is a great example of Ruby’s way of promoting Unix tradition by making it easy to write well-behaved Unix utilities.

Next time you write a Ruby program for processing data, give ARGF a try!