How to Read Programmer Documentation

An excerpt from Learn Python the Hard Way, 5th Edition that explains how I analyze learn from projects with poor or no documentation (which is most of them).

By Zed A. Shaw

How to Read Programmer Documentation

This is an excerpt from Learn Python the Hard Way, 5th Edition exercise 55. I felt this exercise was so generally useful that I should copy it to the blog so people can refer to it if they're struggling with documentation. Keep in mind that I don't think Pandas' docs are bad, they just need a cohesive course to bring it all together.

This exercise is going to teach two very important skills. First, you'll learn about Pandas and its DataFrame construct. This is the most common way to work with data in the Python Data Science world. Second, you're going to learn how to read typical programmer documentation. This is a far more useful skill as it applies to every single programming topic you will ever encounter. In fact, you should think of this exercise as using Pandas to teach you how to read documentation.

Why Programmer Documentation Sucks

There's a concept in painting called "the gestalt." The gestalt of a painting is how all of the parts of a painting fit together to create a single cohesive experience. Imagine I paint a portrait of you and create the most perfect mouth, eyes, nose, ears, and hair you've ever seen. You see each part is perfect and then you pull back, and when placed together...they're all wrong. The eyes are too close together, the nose is too dark compared to everything else, and the ears are different sizes. On their own, they're perfect, but when combined into a finished work of art they're awful because I didn't also pay attention to the gestalt of the painting.

For something to have high quality you have to pay attention to the qualities of each individual piece, and how those pieces fit together. Programmer documentation is frequently like this awful portrait with perfect features that don't fit together. Programmers will very clearly and accurately describe every single function, the nuances of every option to those functions, and every class they made. Then completely ignore any documentation that describes how those pieces fit together or how to use them to do anything.

This kind of documentation is everywhere. Look at Python's original sqlite3 documentation then compare it to the latest version that finally has how to use placeholders. That's a fairly important topic you need for good security and it's...just casually ignored for about a decade?

Learning from this documentation requires a particular style of reading that's more active. That's what you will learn in this exercise.

How to Actively Read Programmer Docs

I won't force you to suffer through really bad documentation. Instead you'll take a baby step and learn how to read documentation using the Pandas documentation. The Pandas documentation is good. It at least has a quick start guide to get you going, cookbooks, how-to guides, an API reference, and lots of examples. Everything is clearly described, but when you read it you're still kind of lost because it's a lot of documentation spread all over with no clear curriculum.

This is where active reading comes into play, and it's something I've had you do for this entire course by making you type in code and change it. To read programmer documentation actively means you have to type in the code as you read, change the code to find more, and apply what you learn to your own problems to learn how to use what you learn. Your goal with this process is to find the gestalt the programmers ignored.

Step 1: Find the Docs

The very first thing you should do is find the docs. You might laugh but sometimes that's a difficult first step. Important questions to ask:

  1. Are you looking at the right version of the docs? This is a very common problem in Python and JavaScript because sometimes the old documentation is more popular in Google than new documentation.
  2. Is this documentation a guide or an API description? You need at least a guide and API documentation. You actually need more than that, but if a project only has API documentation then you're going to have to work much harder to learn it. A guide is where you want to start.
  3. Is there a cookbook or how-to guide with lots of examples? You've found a unicorn in the world of programming.
  4. What are the most interesting topics to you? Do you have a specific pressing need? Is there a document covering this topic?

Step 1 with Pandas

Let's go through the Pandas documentation and answer each of these questions:

  1. Yes, it looks like the documentation is the right version.
  2. The /docs/ has both guides and API reference. You'll need the guide to follow, and the API reference to look up specifics about things you use in your own projects later.
  3. Yes, there's both getting started tutorials which show you how to do various things, and a cookbook in the User guide with many quick examples.
  4. What are the most interesting topics to you? In this exercise you'll focus on DataFrame so any documents that cover that are useful. If you wanted to process many .csv files then you'd look for documents explaining loading and saving .csv files.

Step 2: Determine Your Strategy

What do you do if most of these have "No" answers? What if the project only has auto-generated API docs and not a single document or example explaining how to use the API? First, do you have to use this pile of garbage? Life's too short to use software that not even the developers care about, so maybe just don't use it. If you really want to use it or have to use it, then you have two complimentary strategies:

  1. Find guides and example code other people wrote about the project.
  2. Choose your own small project that will use this project, and spend your time reading the API docs to get your project working.

If the project has everything you need then you have a couple different strategies:

  1. Start with any cookbooks and how-to documents with many examples.
  2. Start with the guides that walk through each topic the project thinks is important.
  3. Start with the API docs anyway and try to make your own software using the API.

These options are not mutually exclusive. Start with one option, and if it's not working, switch to another. Keep doing this until you understand enough to use the project or study further.

Step 2 with Pandas

In the Pandas example we have everything we need except an overall curriculum telling us where to go, so that's why you need a strategy. I have three complimentary strategies in this situation:

  1. Start with the cookbooks and how-to documents and use those as a guide to dive deeper into related documentation.
  2. Start with the deeper user guide and as you go through it read cookbooks and how-to documents to get practical examples.
  3. Try to make something using the API reference. Sometimes this is the best strategy if you are hot to work on an idea, but don't get discouraged if it's too hard. If you get stuck, switch to the other strategies.

Step 3: Code First, Docs Second

This will seem counter-intuitive but when reading programmer documentation you will have more success if you start with the code, then read about it. This works because the code is something you can experience and that experience gives you better understanding of what's being said in the documentation.

Step 3 with Pandas

Let's look at the 10 minute guide to Pandas as an example. Right away there's this code:

import numpy as np
import pandas as pd
s = pd.Series([1, 3, 5, np.nan, 6, 8])
# this prints it in Jupyter
dates = pd.date_range("20130101", periods=6)
# print it in Jupyter

This code is spread across multiple short descriptions about the code, so you type each example in first. Once it's working, change it around and then read the descriptions. This will make the descriptions easier to understand.

However, if you read the descriptions first this is what you read:

Customarily, we import as follows.

Creating a Series by passing a list of values, letting
pandas create a default RangeIndex.

Creating a DataFrame by passing a NumPy array with a
datetime index using date_range() and labeled columns:

Those on their own or with a quick glance at the code make almost no sense. After you get the code working these sentences help fill in gaps in your understanding. They also link to more documentation on what you just used.

Step 4: Break or Change the Code

After you get a piece of code working take the time to break it so you can see how errors are handled. One massive blocker for beginners is deciphering the convoluted error messages programming languages produce. There's almost a weird art to reading them and using Google to find the answer. One of the ways to learn the "language of terrible errors" is to expose yourself to as many errors as possible on purpose so you can study them.

The second thing to do is ask if you can do something and then try to do it. You'll ask, "How do I give a Series a different index?" Or, you might ask, "How can I pass a Series to a DataFrame?" The kinds of changes you want to focus on are combinations of things you just learned.

Step 5: Take Notes

A key aspect of learning to code (or anything) is explaining what you've learned back to yourself. The best way to do this while you're working is to have a notes.txt file in the directory where you're putting the code you write. In this notes.txt write down questions you have, things you discover, and comments about what you're learning.

Another important part of the notes.txt is links. You should be recording links to what you read or what you need to read as you work. This will help you later when you need to remember where you read about something.

Step 6: Use it On Your Own

The entire purpose of this last module is to move you from someone who knows Python to someone who can use Python to express their own ideas. After you feel you have enough understanding of the project you should try to make something of any size with it. This is when you will switch to relying more on API reference than the other documentation.

Step 6 with Pandas

If you're stuck and can't think of anything to create, then take an example from the cookbook or how-to documents documentation and modify it to do something new. Maybe you have it load the data from a SQL database or change the data used.

Step 7: Write About What You Learned

When I think of painting, writing, and programming I think of them as mediums for articulating my automatic thoughts, experiences, and feelings so I can consciously understand them. Painting helps me understand what I see. Writing helps me understand what I know and feel. Programming helps me understand how to do something.

I spend all day using my eyes to see the world, but it's only when I try to paint what I see that I start to consciously understand what I'm seeing. Painting forces me to consciously understand the automatic way my visual system processes the world.

Programming forces me to structure my understanding of how something works into logical steps and structures. After I turn a process or idea into code I understand how it could actually work.

Writing helps me organize my almost random thoughts into a coherent conscious structure. The act of organizing all of my thoughts into an essay that makes sense and flows naturally helps me further understand my ideas.

More importantly, each of these mediums--painting, programming, and prose--force me to explore what I don't know. Externalizing my knowledge in these ways gives me a glimpse into my brain. I can look at a painting and say, "Well it looks like I have no idea what this flower actually looks like." I can study code and see, "I clearly have no idea how this algorithm is supposed to work." I can read through an essay and see, "I really don't know how to explain what I'm feeling about this topic."

This is why you should write about what you've learned. You don't have to show it to anyone or be a good writer. Your writing doesn't have to be original. I'll tell you, 99% of all writing is not original. The point is not to impress other people with how clever a writer you are. The point is to explain to yourself what you know so you can see if you actually learned something.

Step 7 with Pandas

For Step 7 I want you to write at least 8-10 paragraphs teaching someone else what you've just learned about DataFrames. How would you explain the DataFrame to someone who knows Python? What is your best advice on how to use it? Are there any things to avoid when using it?

Another option is to write your own curriculum to learn the Pandas DataFrame. If you were to write a curriculum for someone else what links should they read in order to best understand it. For each link describe what they learn at that stage, and how it relates to the previous link they studied.

A final option is to use Jupyter to create a Notebook that demonstrates and explains everything someone else would need to learn. I suggest first write a short version of the curriculum idea, and then turn that into a structured notebook that follows the curriculum.

Step 8: What's the Gestalt?

The final step in this process is to ask yourself, "What's the big picture for this project?" This is a more abstract step and should fall out naturally from your writings and notes, but being able to summarize the project will give you a mental framework to hold everything else you learn.

Your understanding of the project might be different from the authors, but your description of the project is more for you than a general statement for everyone.

Step 8 with Pandas

If I were to summarize the purpose of Pandas I might have several "gestalt statements":

  1. "Pandas' purpose is to provide Python with higher level features commonly found in other statistics and math languages like R, SAS, and Mathematica."
  2. "Pandas gives an easier way to load, structure, and manipulate tabular data for analysis."

Am I right? What did you come up with? Does it help you understand Pandas better?

Coming Soon...

While it is useful for you to learn how to read documentation and devise your own curriculum, I also feel you might need one provided by me. The problem is, projects frequently change and I want this course to last longer than the next version of Pandas. To solve this problem, I'll have a dedicated page of LPTHW errata that will also include my idea for a Pandas curriculum you can follow.

More from Learn Code the Hard Way

Just Use MSYS2 You Say?

Debunking idiots on HackerNews who think a total beginner can just 'Use MSYS2' on Windows.

TechnologyPublished Jul 16, 2024

Solving the Worst Problem in Programming Education: Windows

I've created a set of install scripts for Windows that helps people install many different programming languages using standard tools.

TechnologyPublished Jul 11, 2024

Minimum Educational C++

I go through various style guides and attempt to extract the minimum C++ someone can learn to be functional on many code bases, and in your own code.

OpinionPublished Jun 1, 2024

How Long Does it Take to Learn to Code?

My standard answer when people ask me how long it will take for them to become competent enough to get a job.

OpinionPublished Jun 1, 2024