IO in Ruby

Joël Quenneville

Input/Output, generally referred to as I/O, is a term that covers the ways that a computer interacts with the world. Screens, keyboards, files, and networks are all forms of I/O. Data from these devices is sent to and from programs as a stream of characters/bytes.

Unix-like systems treat all external devices as files. We can see these under the /dev directory. Read this list for a quick description of all the devices we might find under /dev for OS X.

For example (truncated for brevity):

$ tree /dev
/dev
├── disk0
├── fd
│   ├── 0
│   ├── 1
│   ├── 2
│   └── 3 [error opening dir]
├── null
├── stderr -> fd/2
├── stdin -> fd/0
├── stdout -> fd/1
├── tty
└── zero

I/O streams are located under the /dev/fd directory. Files there are given a number, known as a file descriptor. The operating system provides three streams by default. They are:

  • Standard input (/dev/fd/0)
  • Standard output (/dev/fd/1)
  • Standard error (/dev/fd/2)

They are often abbreviated to stdin, stdout, and stderr respectively. Standard input will default to reading from the keyboard while standard output and standard error both default to writing to the terminal. As can be seen above, /dev/stdout, /dev/stdin, and /dev/stderr are just symlinks to the appropriate file descriptor.

The IO class

Ruby IO objects wrap Input/Output streams. The constants STDIN, STDOUT, and STDERR point to IO objects wrapping the standard streams. By default the global variables $stdin, $stdout, and $stderr point to their respective constants. While the constants should always point to the default streams, the globals can be overwritten to point to another I/O stream such as a file. IO objects can be written to via puts and print.

$stdout.puts 'Hello World'

We’ve all written the shorthand version of this program:

puts 'Hello World'

The bare puts method is provided by ruby’s Kernel module that is just an alias to $stdout.puts. Similarly, IO objects can be read from via gets. The bare gets provided by Kernel is an alias to $stdin.gets

$stdin is read-only while $stdout and $stderr are write-only.

[1] pry(main)> $stdin.puts 'foo'
IOError: not opened for writing
[2] pry(main)> $stdout.gets
IOError: not opened for reading
[3] pry(main)> $stderr.gets
IOError: not opened for reading

To create a new IO object, we need a file descriptor. In this case, 1 (stdout).

[1] pry(main)> io = IO.new(1)
=> #<IO:fd 1>
[2] pry(main)> io.puts 'hello world'
hello world
=> nil

What about creating IOs to other streams? They don’t have constant file descriptors so we first need to get that via IO.sysopen.

[1] pry(main)> fd = IO.sysopen('/dev/null', 'w+')
=> 8
[2] pry(main)> dev_null = IO.new(fd)
=> #<IO:fd 8>
[3] pry(main)> dev_null.puts 'hello'
=> nil
[4] pry(main)> dev_null.gets
=> nil
[5] pry(main)> dev_null.close
=> nil

/dev/null (sometimes referred to as the “bit bucket” or “black hole”) is the null device on Unix-like systems. Writing to it does nothing and attempting to read from it returns nothing (nil in Ruby)

First, we get a file descriptor for a stream that that is read/write to the dev/null device. Then we create an IO object for this stream so we can interact with it in Ruby. When writing to dev_null, the text no longer appears on the screen. When reading from dev_null, we get nil.

Since everything on a Unix-like system is a file, we can open an IO stream to a text file in the same way we would open a device. We just create a file descriptor with the path to our file and then create an IO object for that file descriptor. When we are done with it, we close the stream to flush Ruby’s buffer and release the file descriptor back to the operating system. Attempting read or write from a closed stream will raise an IOError.

Position

When working with an IO, we have to keep position in mind. Given that we’ve opened a stream to the following file:

Lorem ipsum
dolor
sit amet...

and we call gets on it:

[1] pry(main)> IO.sysopen '/Users/joelquenneville/Desktop/lorem.txt'
=> 8
[2] pry(main)> lorem = IO.new(8)
=> #<IO:fd 8>
[3] pry(main)> lorem.gets
=> "Lorem ipsum\n"

it returns the first line of the file and moves the cursor to the next line. If we check the position of the cursor:

[4] pry(main)> lorem.pos
=> 12

If we call gets a few more times:

[5] pry(main)> lorem.gets
=> "dolor\n"
[6] pry(main)> lorem.gets
=> "sit amet...\n"
[7] pry(main)> lorem.pos
=> 30

we can see ruby’s “cursor” has moved. Now that we have read the whole file, what happens if we try to call gets?

[8] pry(main)> lorem.gets
=> nil
[9] pry(main)> lorem.eof?
=> true

We see that it returns nil. We can ask a stream if we have reached “end of file” via eof?. To return to the beginning of the stream, we can call rewind.

[10] pry(main)> lorem.rewind
=> 0
[11] pry(main)> lorem.pos
=> 0

This can lead to surprises when writing to a stream.

[1] pry(main)> fd = IO.sysopen '/Users/joelquenneville/Desktop/test.txt', 'w+'
=> 8
[2] pry(main)> io = IO.new(fd)
=> #<IO:fd 8>
[3] pry(main)> io.puts 'hello world'
=> nil
[4] pry(main)> io.puts 'goodbye world'
=> nil

This stream has the lines “hello world” and “goodbye world”. If we were to attempt to read:

[5] pry(main)> io.gets
=> nil
[6] pry(main)> io.eof?
=> true

Our cursor is currently at the end of the file. In order to read we would need to first rewind.

[7] pry(main)> io.rewind
=> 0
[8] pry(main)> io.gets
=> "hello world\n"

Any write operations in the middle of a stream will overwrite the existing data:

[9] pry(main)> io.pos
=> 12
[10] pry(main)> io.puts "middle"
=> nil
[11] pry(main)> io.rewind
=> 0
[12] pry(main)> io.read
=> "hello world\nmiddle\n world\n"

This kind of behavior is necessary because streams do not get loaded into memory. Instead, only the lines being operated on are loaded. This is very useful because some streams can point to very large files that would be expensive to load in memory all at once. Streams can also be infinite. For example, $stdin has no end. We can always read more data from it (when it receive the message gets, it waits for the user to type something).

Sub-classes and Duck-types

Ruby gives us a couple subclasses of IO that are more specialized for a particular type of IO:

File

File docs

Probably the most well known IO subclass. File allows us to read/write files without messing around with file descriptors. It also adds file-specific convenience methods such as File#size, File#chmod, and File.path.

The Sockets

Socket docs:

Ruby’s various socket classes inherit all ultimately inherit from IO.

For example, I have a server running on localhost:3000

[1] pry(main)> require 'socket'
=> true
[2] pry(main)> socket = TCPSocket.new 'localhost', 3000
=> #<TCPSocket:fd 10>
[3] pry(main)> socket.puts 'GET "/"'
=> nil
[4] pry(main)> socket.gets
=> "HTTP/1.1 400 Bad Request \r\n"

StringIO

StringIO docs

StringIO allows strings to behave like IOs. This is useful when we want to pass strings into systems that consume streams. This is common in tests where we might inject a StringIO instead of reading an actual file from disk. Unlike previous classes showcased, StringIO does not inherit from IO.

[1] pry(main)> string_io = StringIO.new('hello world')
=> #<StringIO:0x007feacb0cd4e8>
[2] pry(main)> string_io.gets
=> "hello world"
[3] pry(main)> string_io.puts 'goodby world'
=> nil
[4] pry(main)> string_io.rewind
=> 0
[5] pry(main)> string_io.read
=> "hello worldgoodby world\n"

Tempfile

Tempfile docs

Tempfile is another class that doesn’t inherit from IO. Instead, it implements File‘s interface and deals with temporary files. As such, it can be passed to any object that consumes IO-like objects.

Putting it all together

Say we have the following class for some command-line program:

class SystemTask
  def execute
    puts "preparing to execute"

    puts "starting first task"
    first_task

    puts "starting second task"
    second_task

    puts "execution complete"
  end
end

Testing this class causes all these messages to be output, cluttering our results. One approach to solving this problem would be to inject IO objects instead of calling Kernel#puts and to pass in a null object in tests.

class SystemTask
  def initialize(io=$stdout)
    @io = io
  end

  def execute
    @io.puts "preparing to execute"

    @io.puts "starting first task"
    first_task

    @io.puts "starting second task"
    second_task

    @io.puts "execution complete"
  end
end

In production, we can still call SystemTask.new.execute as before. Now we can pass in our own IO in tests. This could be a test double, a StringIO, or a stream to /dev/null

describe SystemTask do
  # test double
  it "executes tasks" do
    io = double("io", puts: nil)
    system_task = SystemTask.new(io)

    system_task.execute

    # expect things to have happened

    # if we care about the messages, we can also expect on the double
    expect(io).to have_received(:puts).with("preparing to execute")
  end

  # StringIO
  it "executes tasks" do
    io = StringIO.new
    system_task = SystemTask.new(io)

    system_task.execute

    # expect things to have happened

    # if we care about the messages read from the string io
    io.rewind
    expect(io.read).to eq "preparing to execute\nstarting first task\nstarting
second task\nexecution complete\n"
  end

  # /dev/null
  it "executes tasks" do
    io = File.open(File::NULL, 'w')
    system_task = SystemTask.new(io)

    system_task.execute

    # expect things to have happened

    # only use /dev/null if we don't care about the messages
  end
end

Working with disparate APIs

While working on a recent project that pulled reports from several APIs, we noticed some responses were strings, others were CSV documents, and others generate the report and then we had to make a request to another endpoint to download it

The solution was to create an adapter for each API that would get the data and return in a standard format wrapped in some type of IO-like object. A persistor object could then process and persist any of the reports as long as they were formatted the same way and were IO-like. For example:

class API1Report
  def fetch
    # fetch report (comes down as a CSV doc)
    # process it to get it in a standard format
    # return standardized report as a Tempfile object
  end
end

class API2Report
  def fetch
    # fetch report
    # returns it as a File object
  end
end

class Persistor
  def initialize(report)
    @report = report
  end

  def persist
    # process and persist the report
  end
end

What’s next

Read an overview of 4.4 BSD’s I/O to develop a deeper understanding of Unix I/O, file descriptors, and devices.

Read the TTY system to understand the relationship between Unix jobs, processes, and I/O with the TTY device.

Practice Ruby I/O by cloning this repo.

Finally, go deeper into Ruby’s I/O in this chapter from Read Ruby.