Tree code feedback

Think of the feedback below as PR comments on your code. I have reviewed the Tree implementation written by many developers and condensed my common PR comments below. When you implement tree in Go, these are some things you should look into. These will help you evaluate whether your solution is good enough.

Argument Parsing and configuration

  • How are you parsing the CLI arguments? Are you parsing them manually by doing string splitting and using a giant switch case statement? If yes, then please use eitherflag.Args() from Golang standard library or Cobra. Go is suitable for writing CLI applications and provides good support for typical tasks like parsing flags and arguments passed to the command. Make use of these packages and refactor your code accordingly. When you use Cobra, it will force a certain structure to your code. Read more about Cobra documentation and samples to find out. By the way, Docker, Kubernetes, and many other tools use Cobra for parsing arguments. When in doubt, for real-world projects, use Cobra.
  • How do you represent flags passed to the tree command inside your code? Have you thought about centralizing various flags in a struct called TreeConfig? If you centralize all the flags in a single struct, you can easily pass that struct to the tree method instead of passing each of the flags as separate parameters. Refactor your code to use a centralized config struct to represent all configuration options of the tree command.
  • Ideally, you should have a function that parses all arguments and converts them into an internal TreeConfig struct that you can pass to other methods. Otherwise, you’ll end up with tree method with multiple arguments for tree command options (-f, -d, -L, -t, etc)
  • The TreeConfig struct should contain bool and ints as data type for the fields in it (to represent whether to print only directories, how many levels to print, etc). If you use just raw strings, you’re not creating an abstraction and might as well not create this struct.
  • Create value as boundaries. Watch https://www.destroyallsoftware.com/talks/boundaries this talk to understand more. Datatypes should be close and accurate, e.g., bool in TreeConfig. There should be two separate structs - one for representing TreeConfig internally (via bool and int) and the other for representing how we accept CLI args for tree, e.g., TreeArgs. TreeArgs struct should have short and long descriptions of the arg as a string property.

Code formatting and tooling

  • Ensure you have configured your code editor (we recommend using VS Code or Goland) so your code is auto-formatted using gofmt.
  • You should be able to run unit tests easily using your code editor, using a single button click, or, better, using a keyboard shortcut.
  • You should be able to debug the code and test cases using breakpoints directly in your IDE.
  • Don’t add extra newlines when not needed. Newlines should be used in the code to improve readability and to separate functions and blocks of code.
  • Follow Go’s standard variable naming convention - capital letters for exported variables, lowercase letters for unexported variables and functions, camelCase for variables and methods, etc. Read https://go.dev/doc/effective_go for more detailed info.

Unit testing and error handling

  • Have you written test cases for the code? Can you identify edge cases and failure points where tree can fail? Some examples are: you run tree in a directory where you don’t have read permissions. How does your program behave in that case? Does it fail with a proper error message or panic abruptly? Can you write a unit test for this scenario? What other error scenarios can you think of? Does your program behave the same as the native tree command in Linux?
  • Do you know about table-driven tests in Go? Can you identify the test scenarios for tree and refactor the code to use table-driven tests? Did you know that you can use VS Code feature to generate test cases by right-clicking the method name? It will auto generate table driven tests for that function.
  • Write tests in standard got and want fashion. Understand the difference between t.Errorf and t.Fatalf and use t.Errorf unless you really need to fail and stop the execution of entire test suite.
  • Don’t panic unless the program absolutely cannot recover from failures. Use appropriate error handling for all error conditions (business-related error cases).
  • Use handle or declare rule for error handling. Either your method returns an error or handles it (by printing/logging it or converting the error into an appropriate business value).
  • Have you tried testing your program on a large directory tree? (e.g.,/home or / directory)? Does it work as expected, or does it fail silently? Does it go out of memory (if you try to run it with some memory limits)? How are you load-testing your implementation of tree?
  • Does your program produce output that exactly matches the real tree command? If not, are you using the correct characters (└ and ├)?
  • Have you written test cases for testing JSON and XML output?
  • Are you creating the data needed for the test cases (directory and file structures), or do you have a pre-defined folder structure for testing? How will you ensure your test cases run on any other person’s machine (or even on CI)?

Performance and core recursive walking logic

  • Does Golang provide any standard library function to traverse through directories recursively? How does it work (depth-first or breadth-first fashion)? What is the output that the real tree command generates? Can we reuse the standard library function, or do you need to write your own custom recursive function?
  • In your recursive logic, you’ll need to implement a “return-early” feature when a level is specified. Instead of traversing the entire dir tree, you’ll need to return early after you hit the level specified by the user (-L 3 would mean return from the third level, don’t traverse to 4th level).
  • As you implement the program, you will have to make a decision.
      1. Compute the entire tree structure in memory (maybe using a tree data structure). Or
      1. Print the tree structure as you traverse it
  • What are the pros and cons you see with each of these approaches? What if you want to run tree program on a / directory, and that directory contains many millions of files? Will your program go out of memory? Or will it print the output as it traverses the entire directory structure?
  • What changes will you need to make in your program to switch from one approach to another in the above case? What’s your preferred approach (from the user’s point of view)? Is it:
      1. compute everything in memory and print at last OR
      1. start printing the directory structure as you traverse
  • If you decide to use a tree data structure, what are you storing in this tree struct? You’ll end up writing the depth-first tree traversal algorithm to print the tree.
  • If you print the directory structure as you go, how will you “look ahead” or find out whether you’re on the last file in the directory? i.e., decide whether to print │─ or └─? If you’re on the last file in the directory, you need to print └─. But if there are more files in the directory, you need to print │──. This is a tricky part of the logic.
  • Like you probably saw during implementing command, is there any way you can use goroutines in tree? If yes, how and where will you use goroutines?

Handling JSON, XML, and normal tree printing

  • Use names from the problem domain. e.g., The problem specifies that you print a report summary indicating how many files and directories tree has scanned. In this case, it would be nice to create a struct named Report that contains files and dirs as two int variables to indicate how many files and directories has the tree command scanned. This way, you can convert this struct into any format as needed.
    • XML
      <report> <directories>32</directories> <files>384</files> </report>
      36 directories, 932 files
  • You might write three separate traverser implementations (for printing normal tree output, JSON output and XML output). Try to reuse code as much as possible among these three separate traversals. Move the common parts into structs and provide three separate stringer implementations for that struct.

Others (Misc)

  • Your code will mostly evolve to a single function using recursion or filepath.Walk(). This function should return a string output (instead of printing it). This way, you can test this function with various cases (single dir, multiple nested dir, etc.) and all flags (-f, -l, -p, -d, etc.) via tests too.
  • There should ideally be only one fmt.Print() call in the whole source code. This depends on your approach (compute everything in memory and print only once). This way, you minimise the side effects and have them at only the main function level.
  • Even if you’re printing the tree as you traverse, instead of directly calling fmt.Print(), return the string to the caller or push it to a channel. That way, the caller can pull values from the channel and print them one at a time, again from main function. This is the core concept of Functional Core and Imperative Shell. If you just print the values instead of returning them, your test cases will be very difficult to write (you’ll have to test for the side effect of printing). Hence, return String values instead of printing them as much as possible.
  • Spend more time understanding what the problem statement demands. Try out the Linux tree command and handle errors in your program as the tree command does.
  • Don’t make CLI commands interactive unless needed. Instead of asking for user input via dialogues, let the user specify all the flags at once. Interactivity is for UI, not for CLI programs. See Unix Philosophy.
  • Some flags for tree command can be combined. e.g., -f and -p. This should print only files and their permissions. If someone also adds the -L 3, then the all these flags should be ANDed together and tree structure only till level 3 should be shown. Write your program such that it handles these cases. Make the code reusable accordingly.
  • Make functions private when they don’t need to be public. Make Recursion func for JSON and XML private.
  • Use named params in Struct for better readability. e.g., TreeConfig{true, false, true, false} is not readable. Compare this to TreeConfig{onlyDir: true, relPath: false, xmlFormat: true, filePerm: false}. The latter is more readable than the former.