Grep code feedback

Think of the feedback below as PR comments on your code. I have reviewed the Grep implementation written by many developers and condensed my common PR comments below. When you implement grep in Go, these are some things you should look into. These will help you evaluate whether your solution is good enough.

Argument Parsing and configuration

  • How are you parsing the CLI arguments? Are you parsing them manually by doing string splitting and using a giant switch case statement? If yes, then please use eitherflag.Args() from Golang standard library or Cobra. Go is suitable for writing CLI applications, and it provides really good support for doing typical tasks like parsing flags and arguments passed to the command. Make use of these packages and refactor your code accordingly. When you use Cobra, it will force a certain structure to your code. Read more about Cobra documentation and samples to find out. By the way, Docker, Kubernetes, and many other tools use Cobra for parsing arguments. When in doubt, for real-world projects, use Cobra.
  • How do you represent flags passed to the grep command inside your code? Have you thought about centralizing various flags in a struct called GrepConfig? If you centralize all the flags in a single struct, you can easily pass that struct to the grep method instead of passing each of the flags as separate parameters. Refactor your code to use a centralized config struct to represent all configuration options of the grep command.
  • Ideally, you should have a function that parses all arguments and converts them into an internal GrepConfig struct that you can pass to other methods. Otherwise, you’ll end up with grep method with multiple arguments for grep command options.

Code formatting and tooling

  • Ensure you have configured your code editor (we recommend using VS Code or Goland) so your code is auto-formatted using gofmt.
  • You should be able to run unit tests easily using your code editor, using a single button click or, better, using a keyboard shortcut.
  • You should be able to debug the code and test cases using breakpoints directly in your IDE.
  • Don’t add extra newlines when not needed. Newlines should be used in the code to improve readability and to separate functions and blocks of code.
  • Follow Go’s standard variable naming convention - capital letters for exported variables, lowercase letters for unexported variables and functions, camelCase for variables and methods, etc. Read https://go.dev/doc/effective_go for more detailed info.

Unit testing and error handling

  • Have you written test cases for the code? Can you identify edge cases and failure points where grep can fail? Some examples are: you run grep in a directory where you don’t have read permissions. How does your program behave in that case? Does it fail with a proper error message or panic abruptly? Can you write a unit test for this scenario? What other error scenarios can you think of?
  • Do you know about table-driven tests in Go? Can you identify the test scenarios for grep and refactor the code to use table-driven tests? Did you know that you can use VS Code feature to generate test cases by right-clicking the method name? It will auto generate table driven tests for that function.
  • Are you testing the case where your program reads from STDIN? If not, how can you test your program, in this case, automated manner (without manually passing the input)? Find out how to write a test for reading input via a standard input stream. (hint: you can use io.Reader interface to make it easier to "mock" standard input stream in Go). Search about this on StackOverflow and come up with some solution.
  • Write tests in standard got and want fashion. Understand the difference between t.Errorf and t.Fatalf and use t.Errorf unless you really need to fail and stop the execution of the entire test suite.
  • Don’t panic unless the program absolutely cannot recover from failures. Use appropriate error handling for all error conditions (business-related error cases).
  • Use handle or declare rule for error handling. Either your method returns an error or handles it (by printing/logging it or converting the error into an appropriate business value).
  • Are you creating the data needed for the test cases (directory and file structures along with some dummy file contents), or do you have a pre-defined folder structure for testing? How will you ensure your test cases run on any other person’s machine (or even on CI)?
  • Have you tried testing your program in a directory tree containing many thousand files? How about testing the code on large files (GBs in size)? Does it work as expected or does it fail silently? Does it go out of memory (if you try to run it with some memory limits)? How are you load-testing your implementation of grep?

Power of interfaces

  • Do you have two functions - one for reading from STDIN and the other for reading from a file? Is it possible to abstract the reading part into a common functionality? If you still didn’t get the hint, try using io.Reader interface to create this common abstraction for both STDIN and os.File. When you do refactor your code to use io.Reader, notice how your test cases become simpler.

Performance (when implementing -r for recursive grep in a directory)

  • Does Golang provide any standard library function to traverse through directories recursively? How does it work (depth-first or breadth-first fashion)?
  • Are you reading the entire file in memory as a single string or a []string and then grepping in that string? Your program will work fine for small files, but can it grep in a 10GB file where your laptop RAM is 8GB? Can you refactor your code to grep the string from a file in a “line by line” fashion? Basically, don’t read the entire file in memory as a single string (or even []string). Instead, read the file line by line, and for each line, when you have a match, add the matching line in the final results being printed.
  • Can you think of performance optimization when you have to grep into a large directory (containing thousands of files)? You can use filepath.Walk() to walk through all the files in a directory and use a goroutine to grep into that file for each file.
  • Are you using multiple gorountines to grep into files? How will you break your work into multiple goroutines? Will you use one goroutine per file or one goroutine for a directory? Also, how will you ensure the results from multiple goroutines are not interleaved in the final output? (hint: you’ll need to take a mutex lock on the final writer).
  • Have you tried running grep in a large directory (e.g. /home/user, or /)? Is it working fine? Can you get the results, or does the program fail with an error like “too many file descriptors”? How can you solve this? (hint: limit the number of max file descriptors by using a buffered channel). If you’re using a goroutine to read from a file, you’ll need to limit the concurrency of goroutines by having a maxOpenFileDescriptors variable.

Handling Results

  • Do you think creating a struct for storing grep results makes sense? This way, you can pass it around easily (instead of printing it immediately and performing that ugly side effect). Having a grepResult struct also allows you to implement the fan-in pattern to push all results into a channel and print them in some order (without interleaving the output from files).
  • How do you ensure not to have interleaving of outputs? i.e., all grep result lines from a single file should be in one place. They should not interleave with grep results from other files. This means you need a way to fan-in the results into a serial order to print them one file at a time.

Others (Misc)

  • Can you exclude files (e.g., binary files or image or movie files) when grepping text? Can you implement some flag (e.g. --exclude-files=jpg,mov,png) to exclude all these file types while traversing the directory during recursive grep? Alternatively, you can implement another flag (e.g. --include-files=txt,md,go) to only include relevant file types when grepping in a directory.
  • For CLI programs such as grep, we should NOT implement interactivity. i.e. Don’t make the CLI behave in an interactive way, where you first ask user the search string, then ask your (via a prompt) the file/directory to grep into. DON’T DO THIS. EVER. A good CLI should be non-interactive and follow Unix Philosophy (read more here - http://www.catb.org/~esr/writings/taoup/html/ch01s06.html). Don’t break the following important rule of Unix Philosophy.
    • Expect the output of every program to become the input to another, as yet unknown, program. Don't clutter the output with extraneous information. Avoid stringently columnar or binary input formats. Don't insist on interactive input.
  • Ideally, you should not have multiple fmt.Printf calls throughout your code. Most functions should be returning data (or pushing the data onto a channel) instead of performing the side effect of printing the output directly. Always be lazy and write pure functions without side effects as much as possible.
  • Arrange your code and functions such that they follow the step-down rule. This way, reading the code becomes much easier.