A common pattern in Go programs is to modify a type by creating a new type which implements the same interface(s) as the original type. The new type can then be used anywhere the old type could be used without having to update the calling code.
For example you may open a file and read its contents:
f, _ := os.Open("somefile.txt")
bs, _ := io.ReadAll(f)
io.ReadAll
expects an io.Reader
, a small interface with a single method:
type Reader interface {
Read(p []byte) (n int, err error)
}
Suppose we want to limit how much of the file we read - perhaps as a security measure against untrusted inputs. Go's io
package provides an easy way to do this with a LimitReader
:
f, _ := os.Open("somefile.txt")
lr := io.LimitReader(f, 1024) // only allow reading up to 1024 bytes
bs, _ := io.ReadAll(lr)
The beauty of this approach is that nothing about the call to io.ReadAll
has to change. The interface is the same and the fact that we're not using an *os.File
is transparent to the code using it. (A deeper look into this design approach can be seen in my talk: How Go is Unique: Static Linking, Composition and Russian Doll Coding.)
At least that's how it's supposed to work. In reality the type is not transparent because it only implements a subset of the interfaces supported by the wrapped type. For example *os.File
implements io.Writer
, io.Seeker
, io.Closer
, etc... but io.LimitedReader
only implements io.Reader
:
f, _ := os.Open("somefile.txt")
_, ok = f.(io.Writer) // ok == true
r := io.LimitReader(f, 1024)
_, ok = f.(io.Writer) // ok == false
I call this phenomenon Interface Erasure.
Losing our interfaces doesn't much matter in the case of io.ReadAll
, since it only uses the Read
method, and since the function specifies it requires an io.Reader
, that requirement is checked at compile-time.
But what if the function didn't specify the io.Reader
as an argument. What if, for example, it merely specified an empty interface ReadAll(someArgument interface{})
and then the code was written in such a way that it used a type assertion to see if what was passed was an io.Reader
. It would be very easy to overlook this runtime-enforced type assertion and break our interface transparency.
That would be an exceedingly odd way of writing ReadAll
, so does this problem ever really happen?
Sadly, yes. Here are some examples:
io.Copy
supports a more efficient copy mechanism via the ReaderFrom
or WriterTo
which *os.File
leverages by defining ReadFrom
with the copy_file_range
system call. When wrapped, *os.File
loses this method and we end up reverting back to the much slower standard copy.
The http
package has several additional interfaces an http.ResponseWriter
might implement:
type Flusher interface {
Flush()
}
type Hijacker interface {
Hijack() (net.Conn, *bufio.ReadWriter, error)
}
type Pusher interface {
Push(target string, opts *PushOptions) error
}
When wrapped an http.ResponseWriter
will lose these methods which can break HTTP functionality. For example you won't be able to use WebSockets anymore.
encoding/gob.NewDecoder
tests the io.Reader
to see if it supports io.ByteReader
. If not it creates a new bufio.Reader
. When wrapped we could end up double-buffering.
encoding/json.Unmarshal
tests the object for the json.Unmarshaler
interface and calls it if found. When wrapped we might inadvertently fallback to builtin JSON encoding/decoding.
In the wild I've been bit by the HTTP and IO interfaces which seem to be the most common type-asserted interfaces.
When you know the exact type you'd like to wrap the most straightforward solution to this problem is to use an embedded field. Embedded fields are struct fields which have no name:
type T1 struct {
example int
}
type T2 struct {
T1 // embedded
t1 T1 // not embedded
}
var t2 T2
// promoted:
t2.T1.example == t2.example
Embedded fields have the nice property that their methods are "promoted". That is:
A field or method f of an embedded field in a struct x is called promoted if x.f is a legal selector that denotes that field or method f. [...] Promoted fields act like ordinary fields of a struct
In the example above we can call the example
field directly on the T2
type because the T1
type has been embedded within T2
.
Using this feature we can create a new composite type with two fields. The first is the original type, embedded so we inherit all the original methods. The second is a non-embedded field with just the interfaces we'd like to wrap. Taking our original example:
type fileLimitReader struct {
*os.File
reader io.Reader
}
func (r fileLimitReader) Read(p []byte) (int, error) {
return r.reader.Read(p)
}
func FileLimitReader(file *os.File, limit int64) io.Reader {
return fileLimitReader{file, io.LimitReader(file, limit)}
}
func main() {
f, _ := os.Create("example.txt")
w := FileLimitReader(f, 1024)
_, ok1 := w.(io.Closer)
_, ok2 := w.(io.Reader)
_, ok3 := w.(io.Writer)
fmt.Println(ok1, ok2, ok3)
// => true true true
}
In this code a call to f.Read
would go to the LimitedReader
whereas any other method call would go to File
via promotion. That is f.Read
is equivalent to f.reader.Read
and f.Write
is equivalent to f.File.Write
.
Is it possible to simplify this code? What if we just embedded both fields?
func FileLimitReader(file *os.File, limit int64) io.Reader {
return struct{*os.File;io.Reader}{file,io.LimitReader(file, limit)}
}
Unfortunately the compiler won't let you do this:
./prog.go:26:3: struct { *os.File; io.Reader }.Read is ambiguous
Verbosity can be a bit tedious, but the bigger issue with this approach is it means we have to create type-specific implementations for our wrapping methods. It would be much nicer if we could have a single LimitReader
function which accepted a mere Reader
and yet didn't erase all the other possible interfaces. Is that possible?
Our first crack at it might be to embed the empty interface, thus accepting any type, e.g.:
type anyLimitReader struct {
interface{} // not valid
reader io.Reader
}
But that won't work because interface{}
isn't a valid type name. Let's try giving it a name:
type T interface{}
type anyLimitReader struct {
T
reader io.Reader
}
That's compilable Go, but it doesn't help us because a new type loses all the methods of the original:
A defined type may have methods associated with it. It does not inherit any methods bound to the given type.
But maybe a type alias is an option, since it preserves the methods?
package main
import (
"fmt"
"io"
"os"
)
type T = interface{}
type anyLimitReader struct {
T
reader io.Reader
}
func (r anyLimitReader) Read(p []byte) (int, error) {
return r.reader.Read(p)
}
func AnyLimitReader(any interface{}, limit int64) io.Reader {
return anyLimitReader{any, io.LimitReader(any.(io.Reader), limit)}
}
func main() {
f, _ := os.Create("example.txt")
w := AnyLimitReader(f, 1024)
_, ok1 := w.(io.Closer)
_, ok2 := w.(io.Reader)
_, ok3 := w.(io.Writer)
fmt.Println(ok1, ok2, ok3)
// => false true false
}
So that doesn't work either. The empty interface doesn't provide us with the embedded promotion behavior we were looking for. However there is another strategy we could pursue.
Rather than store the *os.File
type, what if we stored an interface that represented the method set of *os.File
?
package main
import (
"fmt"
"io"
"os"
"syscall"
"time"
)
type File interface {
Chdir() error
Chmod(mode os.FileMode) error
Chown(uid, gid int) error
Close() error
Fd() uintptr
Name() string
Read(b []byte) (n int, err error)
ReadAt(b []byte, off int64) (n int, err error)
ReadDir(n int) ([]os.DirEntry, error)
ReadFrom(r io.Reader) (n int64, err error)
Readdir(n int) ([]os.FileInfo, error)
Readdirnames(n int) (names []string, err error)
Seek(offset int64, whence int) (ret int64, err error)
SetDeadline(t time.Time) error
SetReadDeadline(t time.Time) error
SetWriteDeadline(t time.Time) error
Stat() (os.FileInfo, error)
Sync() error
SyscallConn() (syscall.RawConn, error)
Truncate(size int64) error
Write(b []byte) (n int, err error)
WriteAt(b []byte, off int64) (n int, err error)
WriteString(s string) (n int, err error)
}
type fileLimitReader struct {
File
reader io.Reader
}
func (r fileLimitReader) Read(p []byte) (int, error) {
return r.reader.Read(p)
}
func FileLimitReader(file File, limit int64) io.Reader {
return fileLimitReader{file, io.LimitReader(file, limit)}
}
func main() {
f, _ := os.Create("example.txt")
w := FileLimitReader(f, 1024)
_, ok1 := w.(io.Closer)
_, ok2 := w.(io.Reader)
_, ok3 := w.(io.Writer)
fmt.Println(ok1, ok2, ok3)
// => true true true
}
This preserves the method set of any type that also implements all the same methods as *os.File
. Unfortunately that's not all that useful because *os.File
has a ton of methods. So lets first strip it down to the methods we care about: (the ones likely to be type-asserted). For example:
type File interface {
Close() error
Read(b []byte) (n int, err error)
ReadAt(b []byte, off int64) (n int, err error)
ReadFrom(r io.Reader) (n int64, err error)
Seek(offset int64, whence int) (ret int64, err error)
Write(b []byte) (n int, err error)
WriteAt(b []byte, off int64) (n int, err error)
}
All of these happen to have interfaces defined in io
, so we can also define File
like this:
type File interface {
io.Closer
io.Reader
io.ReaderAt
io.ReaderFrom
io.Seeker
io.Writer
io.WriterAt
}
Interface embedding may not be super common in day-to-day Go coding, but notice how similar it looks to struct embedding:
type File struct {
io.Closer
io.Reader
io.ReaderAt
io.ReaderFrom
io.Seeker
io.Writer
io.WriterAt
}
And this weird parallel actually works because of method promotion on embedded types. That is to say:
x := FileViaInterfaceEmbedding(f)
y := FileViaStructEmbedding{f,f,f,f,f,f,f}
// x.Read == y.Read == x.Reader.Read == y.Reader.Read
We can leverage this:
package main
import (
"fmt"
"io"
"os"
)
type File struct {
io.Closer
io.Reader
io.ReaderAt
io.ReaderFrom
io.Seeker
io.Writer
io.WriterAt
}
type fileLimitReader struct {
File
reader io.Reader
}
func (r fileLimitReader) Read(p []byte) (int, error) {
return r.reader.Read(p)
}
func AnyLimitReader(any io.Reader, limit int64) io.Reader {
var f File
if closer, ok := any.(io.Closer); ok {
f.Closer = closer
}
f.Reader = io.LimitReader(any, limit)
if readerAt, ok := any.(io.ReaderAt); ok {
f.ReaderAt = readerAt
}
if readerFrom, ok := any.(io.ReaderFrom); ok {
f.ReaderFrom = readerFrom
}
if seeker, ok := any.(io.Seeker); ok {
f.Seeker = seeker
}
if writer, ok := any.(io.Writer); ok {
f.Writer = writer
}
if writerAt, ok := any.(io.WriterAt); ok {
f.WriterAt = writerAt
}
return fileLimitReader{f, io.LimitReader(any, limit)}
}
func main() {
f, _ := os.Create("example.txt")
w := AnyLimitReader(f, 1024)
_, ok1 := w.(io.Closer)
_, ok2 := w.(io.Reader)
_, ok3 := w.(io.Writer)
fmt.Println(ok1, ok2, ok3)
// => true true true
}
Now we're getting somewhere. Our AnyLimitReader
takes a mere io.Reader
but manages to implement a bunch of other IO interfaces without embedding the *os.File
concrete type. For example you can do this:
func main() {
w := AnyLimitReader(strings.NewReader("xyz"), 1024)
_, ok1 := w.(io.Closer)
_, ok2 := w.(io.Reader)
_, ok3 := w.(io.Writer)
fmt.Println(ok1, ok2, ok3)
// => true true true
}
So we get all those methods even though we're not using an *os.File
any more...
But now we've fallen off the other end. strings.Reader
doesn't implement Closer
or Writer
. And if we try to call them we'll get a nil reference panic. So this isn't going to work. What we need instead is a way of constructing the struct that only contains the methods we actually implement via embedding. Then we will both preserve the existing interfaces, without accidentally introducing ones we don't actually support.
Let's consider what this is going to take. Suppose we want to support a single additional interface, io.Closer
:
// => result will either be a bare io.Reader, or an io.ReadCloser
func AnyLimitReader(any io.Reader, limit int64) io.Reader {
reader := io.LimitReader(any, limit)
if closer, ok := any.(io.Closer); ok {
return struct { io.Closer ; io.Reader }{closer, reader}
} else {
return reader
}
}
And now we can add support for io.Writer
too:
// => result will either be a bare io.Reader, an io.ReadCloser, an io.WriteCloser or an io.ReadWriteCloser
func AnyLimitReader(any io.Reader, limit int64) io.Reader {
reader := io.LimitReader(any, limit)
closer, closerOK := any.(io.Closer)
writer, writerOK := any.(io.Writer)
switch {
case closerOK && writerOK:
return struct {io.Closer;io.Writer;io.Reader}{closer, writer, reader}
case closerOK:
return struct {io.Closer;io.Reader}{closer, reader}
case writerOK:
return struct {io.Writer;io.Reader}{writer, reader}
default:
return reader
}
}
Notice that it's not enough to merely assert that any
is an io.Closer
once. We also need to handle the case where its both an io.Closer
and an io.Writer
. What happens when we add a third interface?
I will cut to the chase. This is a classic combinatorics problem. For 1 interface there are 2 cases, for 2 interfaces there are 4 cases, for 3 interfaces there are 8 cases, ... in general for n
interfaces we need 2n
cases. That's a lot of structs.
This approach falls into a weird gray area. It's just big enough that no one would want to write and maintain this code, but just fast enough that its computationaly feasible. Typically constructing a reader is done less often than using one, which means a bit of up front computation cost and code bloat is probably reasonable.
But no one in their right mind would want to write all that code. Indeed we don't have to. We can generate it.
This solution can be seen in all its absurdity here: github.com/badgerodon/contextaware.
First we define all the types we care about:
func wrapIO(i interface{}) interface{} {
type (
t00i = io.Closer
t01i = io.Reader
t01o = Reader
t02i = io.ReaderAt
t02o = ReaderAt
t03i = io.ReaderFrom
t04i = io.Seeker
t05i = io.Writer
t05o = Writer
t06i = io.WriterAt
t06o = WriterAt
t07i = io.WriterTo
)
Some of these types are wrapped and others are left as is (hence the i
, o
distinction). Next we define the wrapper functions we will use:
var (
f01 = wrapReader
f02 = wrapReaderAt
f05 = wrapWriter
f06 = wrapWriterAt
)
In the AnyLimitReader
example above we used a switch statement with a bunch of booleans. But once we get to hundreds or thousands of cases there's actually a better approach we can take. We can use a switch with an integer which Go can optimize into a jump table.†
To make this work we use an integer as a bit array. Each interface is given its own bit position and we OR the bits together to get the final number.
var f uint64
o00, b00 := i.(t00i)
if b00 { f |= 0x0001 }
o01, b01 := i.(t01i)
if b01 { f |= 0x0002 }
o02, b02 := i.(t02i)
if b02 { f |= 0x0004 }
o03, b03 := i.(t03i)
if b03 { f |= 0x0008 }
o04, b04 := i.(t04i)
if b04 { f |= 0x0010 }
o05, b05 := i.(t05i)
if b05 { f |= 0x0020 }
o06, b06 := i.(t06i)
if b06 { f |= 0x0040 }
o07, b07 := i.(t07i)
if b07 { f |= 0x0080 }
We then write a giant switch statement with every possible bit variation and the corresponding struct type:
switch f {
case 0x0000: return struct{}{}
case 0x0001: return struct{t00i}{o00}
case 0x0002: return struct{t01o}{f01(o01)}
case 0x0003: return struct{t00i;t01o}{o00,f01(o01)}
case 0x0004: return struct{t02o}{f02(o02)}
case 0x0005: return struct{t00i;t02o}{o00,f02(o02)}
case 0x0006: return struct{t01o;t02o}{f01(o01),f02(o02)}
//...
case 0x00fd: return struct{t00i;t02o;t03i;t04i;t05o;t06o;t07i}{o00,f02(o02),o03,o04,f05(o05),f06(o06),o07}
case 0x00fe: return struct{t01o;t02o;t03i;t04i;t05o;t06o;t07i}{f01(o01),f02(o02),o03,o04,f05(o05),f06(o06),o07}
case 0x00ff: return struct{t00i;t01o;t02o;t03i;t04i;t05o;t06o;t07i}{o00,f01(o01),f02(o02),o03,o04,f05(o05),f06(o06),o07}
}
panic("unreachable")
That's a big switch statement! But believe it or not this actually works.
I've used this technique for the contextaware
package as well as Datadog's tracing library: dd-trace-go
.
Next year Go will get type parameters which adds generic programming to Go and will likely provide a far superior way of doing the same thing.
† Compiler-optimized jump tables were recently implemented but have not yet been released. However the optimizer will use binary search for switch statements.