Video Coming Soon...
24: Reading and Writing Files
WARNING This exercise is in DRAFT status, so there may be errors. If you find any, please email me at help@learncodethehardway.com so I can fix them.
I believe that Go's file Input/Output (I/O or just IO) is its weakest, weirdest, and most poorly design feature. I'm saying this not because I hate Go, but because I want you to understand that if this seems weirdly complicated compared to the rest of the language then you are correct. It is weird.
A short list of criticisms of Go's IO are:
- The same operations are spread across multiple packages with seemingly no thought put into why that is.
- It's design doesn't feel very "Go-like," but more like someone who is used to Java would want.
- There's mutliple conflicting ways to read and write files, with no guidance on why one or the other.
- There's no documentation. Seriuosly, go look at the Go documentation and try to find any document that covers reading and writing files. This is so bizarre compared to any other language every invented since the dawn of computing.
- Doing even simple things like reading each line of a file is overly complex and can be done in various different ways.
- Many of the APIs require you to know the size of the data, and read into a fixed size buffer, which always causes problems with efficient reading/writing data.
- Even stranger, the buffered IO package bufio replicates this bizarre "read data into my byte buffer" design, but it's a buffered IO system that's also buffering the data, so...it should do this for you.
- The supposed performance gains of this design are extremely dubious, especially with modern file systems and the existence of modern drives. The various multiple designs of Go's IO really are from the early 2000s when Java was king and disks sucked. Today our disks are nearly the same as our RAM so this whole "multiple structs that read and write buffers" design doesn't take advantage of that.
I could go on, but the problem is none of this really matters because you will have to deal with code that uses all of these different IO packages. You'll run into one person who loves ReadLine()
and another who loves Scan()
and yet another who does everything with os.Read()
. That's ultimately not their fault, because Go doesn't even have documentation explaining which one to use when.
In this exercise I'll expose you to all the various ways I could find to do one simple task:
Copy one file to another, either directly or line by line.
This is the most basic thing, so we'll see how Go does it. In some cases, Go does well, and in others...it's a horror show.
os.ReadFile/os.WriteFile Style
NOTE For all of these examples I'm using a simple
build.log
file but you can find your own. Any file will do.
The first, and simplest is to use os.ReadFile
and os.WriteFile
from the Go os package. I think this is probably the one you'll use 80-90% of the time:
View Source file ex24a_os_read_write/main.go Only
package main
import (
"os"
"log"
)
func main() {
in_data, err := os.ReadFile("build.log")
if err != nil { log.Fatal(err) }
err = os.WriteFile("out.log", in_data, 0644)
if err != nil { log.Fatal(err) }
}
As you can see, easy. Just call function, get data and error, if no error then file is read. Write file by do opposite. Grug.
WARNING Some Go programmers will claim that this has low performance but never believe someone claiming performance who also doesn't have statistics for your specific use case. Even then don't believe them as everyone gets statistics wrong. Most likely only 3% of Go applications out there need to use a different API for performance, and I'd wager that all of the APIs available are lacking.
io.Copy
Style
We start to get more complicated with the io.Copy
style of copying a file to another. The Go io has many functions for IO, but still relies on os
to actually work, so it's...not an IO API? Here's how you would use io
to do this:
View Source file ex24b_io_copy/main.go Only
package main
import (
"fmt"
"os"
"io"
"log"
)
func main() {
build_log, err := os.Open("build.log")
defer build_log.Close()
if err != nil { log.Fatal(err) }
out_file, err := os.OpenFile("out.log", os.O_RDWR|os.O_CREATE, 0644)
defer out_file.Close()
if err != nil { log.Fatal(err) }
n, err := io.Copy(out_file, build_log)
if err != nil { log.Fatal(err) }
fmt.Println("read n bytes", n)
}
Notes on this "design":
- I have to use
os.OpenFile()
to open a file...for theio
package to use it. What? Why? - I have to call my own
defer close()
on every file, even thoughCopy
could close it because...it's done copying? Usually I wouldn't try to reuse the file after, but I guess someone said, "Well, ahhhhctuallllly there's %1 of the Go programmers who want to do something jank like reusing a dubious previously opened file so we kept it." - Once again, the
io.Copy()
parameters are backwards. I never in my life would say "Copy to this file, from that file." I always say "copy this file to that file" but for some insane reason this pattern pops up on any "from to" operation. Incidentally, this was also a massive source of bugs in C, so it's awesome they were replicated here.
Raw io
Style
Next we have using "raw io
" to read the file manually. This is mostly the same as with io.Copy
but instead you're using the other functions in io
to read and then write it:
View Source file ex24c_raw_io/main.go Only
package main
import (
"fmt"
"os"
"log"
)
func main() {
build_log, err := os.Open("build.log")
defer build_log.Close()
if err != nil { log.Fatal(err) }
finfo, err := build_log.Stat()
if err != nil { log.Fatal(err) }
in_data := make([]byte, finfo.Size())
n, err := build_log.Read(in_data)
if err != nil { log.Fatal(err) }
fmt.Println("read", n, "bytes")
out_file, err := os.OpenFile("out.log", os.O_RDWR|os.O_CREATE, 0644)
defer out_file.Close()
_, err = out_file.Write(in_data)
if err != nil { log.Fatal(err) }
}
While this version is longer there's nothing too odd about it. It's just your typical "open file, read bytes, open output, write bytes" set of operations.
I'd say this is more useful when the file is massive and can't reliably fit into system memory. For example, processing a video would be stupid with os.ReadFile
. Usually though, quickly bringing a file into memory small to medium files with os.ReadFile
is going to be faster than trying to read many tiny chunks.
bufio.ReadString
Style
I'll be honest my friend, this one makes my blood boil. I tried rewriting this to be "simple" multiple times. I agonized over it for several hours. Yes, hours for something as simple as "read a file line by line." In almost any other language this would be easy, and in Go it's some of the strangest IO code I've ever seen.
And I've used Prolog!
View Source file ex24d_bufio/main.go Only
package main
import (
"fmt"
"os"
"bufio"
"log"
"io"
)
func main() {
build_log, err := os.Open("build.log")
defer build_log.Close()
if err != nil { log.Fatal(err) }
out_file, err := os.OpenFile("out.log", os.O_RDWR|os.O_CREATE, 0644)
defer out_file.Close()
if err != nil { log.Fatal(err) }
in_rd := bufio.NewReader(build_log)
lines := 0
for {
line, err := in_rd.ReadString('\n')
if err == io.EOF {
break;
} else if err != nil {
log.Fatal(err)
}
out_file.WriteString(line)
lines++
}
fmt.Println("read", lines, "lines")
}
The documentation for bufio has everything you need, but I present a challenge to you dear friend:
CHALLENGE Prove me wrong. Can you rewrite that
for
loop to not be so weird? It has to work the same, but not have that innerif
. I'd love to see someone make this work because without some unknown feature I think this is just what you do.
I believe that nobody really uses this, so just remember it in case you do see it.
bufio.Scan
Style
The bufio.Scan
is not half bad for doing "line-by-line" processing, and it's probably what you should use if you want to do that:
View Source file ex24e_scan/main.go Only
package main
import (
"os"
"bufio"
"log"
)
func main() {
build_log, err := os.Open("build.log")
defer build_log.Close()
if err != nil { log.Fatal(err) }
out_file, err := os.OpenFile("out.log", os.O_RDWR|os.O_CREATE, 0644)
defer out_file.Close()
if err != nil { log.Fatal(err) }
scan := bufio.NewScanner(build_log)
for scan.Scan() {
line := scan.Text()
out_file.WriteString(line+"\n")
}
if scan.Err() != nil {
log.Fatal(scan.Err())
}
}
You can find the documentation in bufio again, and this API is rather nice. If you have to process a file line-by-line then I suggest this one.
Root.FS
Style
One final style of working with files is to use the Root.FS
system to operate on files in a specific location. This is very useful in systems like webservers where you want to load files from only a single directory and not any outside of it. It's also useful if you want to scan a directory for many files, search (aka Glob) for files by name, and pass an entire directory location to another function.
View Source file ex24f_root_fs/main.go Only
package main
import (
"os"
"log"
)
func main() {
root, err := os.OpenRoot("./")
if err != nil { log.Fatal(err) }
in_data, err := root.ReadFile("build.log")
if err != nil { log.Fatal(err) }
err = root.WriteFile("out.log", in_data, 0644)
if err != nil { log.Fatal(err) }
}
The documentation for Root
is in os#Root and the documentation for FS
is in io/fs#FS...'cause that makes total sense.
As you can see, the Root.FS
system mostly replicates the various useful IO functionality from other systems, and it's localized to where you specify in the os.OpenRoot()
call. This makes it easier to ensure you only get files in that directory.
Another useful thing about Root.FS
is you can open one, and then pass that to various functions for processing the whole directory. It's a lot easier to start off makeing a Root.FS
than to pass a string
around to every function that needs to process the contents of a directory.
SECURITY WARNING If you are relying on this to prevent unwanted access to system files then you are probably not doing it right. I'm not completely clear on the
Root.FS
guarantees, but I'd wager they aren't good enough. It'll help, but you really should be using either a chroot jail or container to prevent unwanted file access in a network server.
Going Further
You are coming to the end of the second module so I think you should spend some time going back over the content covered so far. One highly lucrative study method is to study other code written by better programmers. Find code by any programmer you like, and analyze it. Take the time to recreate it as well. You'll typically learn more from a replica than you would from only reading it, but doing both is superb.
If you can't find a project, then check out my Go repositories and for a specific project related to this topic look at ssgod.
Next up, we make a game from 80s.
Register for Learn Go the Hard Way
Register today for the course and get the all currently available videos and lessons, plus all future modules for no extra charge.