πŸ“¦

MongoDB Oplog to SQL code feedback

Think of the feedback below as PR comments on your code. I have reviewed the Parser implementation written by many developers and condensed my common PR comments below. When you implement this parser in Go, these are some things you should look into. These will help you evaluate whether your solution is good enough or not.
If you prefer a video format, we conducted a live coding session for this problem statement. Details here:
  1. Stories 1, 2 and 3
Video preview
  1. Stories 4, 5, 6 and 7
Video preview
  1. Stories 8 and 9 (with sequential and concurrent implementation with multiple goroutines)
Video preview

A short demo of the problem statement and solution

Β 
Argument Parsing and configuration
  • Cobra for CLI Applications: Consider using the popular library Cobra for writing CLI applications. Cobra provides a robust and well-tested framework for parsing flags and arguments passed to the command. It also enforces a structured approach to organizing command-line tools, making your code more maintainable and easier to understand. Many widely-used tools like Docker and Kubernetes utilize Cobra for their CLI implementation, making it a reliable choice.
  • Separate Configurations from Code: Avoid hardcoding configuration parameters, such as MongoDB endpoints, PostgreSQL hosts, and credentials, directly into the code. Instead, adopt a configuration management approach, where configurations are stored separately from the code. Options for configuration management include environment variables, configuration files, etc.
  • Config Loading and Validation: Implement a configuration loader that reads the required configurations from the chosen source (environment variables, configuration files, etc.). Ensure proper validation and handling of missing or invalid configurations to prevent runtime errors.
Code formatting and tooling
  • Auto-formatting: Set up your code editor to auto-format using gofmt for consistency.
  • Unit Test Integration: Enable easy one-click execution of unit tests in your IDE.
  • Integrated Debugging: Set up your IDE to allow seamless debugging of the code and test cases using breakpoints.
  • Mindful Newline Usage: Avoid unnecessary newlines for clean code.
Unit testing and error handling
  • Have you written test cases for the code? Can you identify edge cases and failure points where the parser can fail? Some examples are: you run a parser with input oplog file where you don’t have read permissions. How does your program behave in that case? Does it fail with a proper error message or panic abruptly? Can you write a unit test for this scenario? What other error scenarios can you think of?
  • Do you know about table-driven tests in Go? Can you identify the test scenarios for the parser and refactor the code to use table-driven tests? Did you know you can use the VS Code feature to generate test cases by right-clicking the method name? It will auto-generate table-driven tests for that function.
  • Write tests in standard got and want fashion. Understand the difference between t.Errorf and t.Fatalf and use t.Errorf unless you need to fail and stop the execution of the entire test suite.
  • Use handle or declare rule for error handling. When returning errors, consider wrapping them with additional context to retain important information about the error's origin..
  • Have you conducted testing on your program with large datasets, such as having 2-3 million documents across multiple MongoDB databases and collections? Did it perform as expected, or did it encounter silent failures? Have you observed any memory issues when running it with memory limits? How have you load-tested your parser implementation to ensure its reliability under heavy loads?
  • Have you written test cases to verify that the generated SQL statements can be executed without errors on the PostgreSQL database?
  • Are you providing the necessary test data (oplog.json file) for the test cases, or do you require a pre-existing MongoDB instance running locally for testing? To ensure your test cases are portable and can run on any other person's machine or Continuous Integration (CI) environment, consider using mock data or creating a setup that can automatically generate the required test data during testing. This approach will make your test suite self-contained and independent of external dependencies.
Performance and orchestration logic to process the oplog
notion image
Β 
This parser is designed to efficiently parse the MongoDB oplog and synchronize data to a PostgreSQL database. Here's an overview of the steps performed by the parser:
  1. Read Oplogs from MongoDB: A goroutine continuously reads the oplogs from the running MongoDB instance and puts them into a channel. This allows for real-time streaming of changes from MongoDB.
  1. Fan-out Oplogs per Database: Another goroutine reads oplogs from the channel and fans them out into multiple channels, one channel per database. At this step, the parser also creates a separate SQL channel for each database to put the generated SQL statements in the subsequent step (Step 4).
  1. Fan-out Oplogs per Collection: Separate goroutines read oplogs from the database channels and fan them out into multiple channels, one channel per collection. This step helps segregate the data based on collections.
  1. Convert Oplogs to SQL Statements: Additional goroutines read oplogs from the collection channels, convert them into SQL statements, and post the SQL statements to the SQL channels created in step 2. This conversion ensures that the data is ready for ingestion into PostgreSQL.
  1. Execute SQL Statements: In the main goroutine, SQL statements are read from the SQL channels created in step 2. Separate goroutines are used for each SQL channel per database to execute the SQL statements on the running PostgreSQL instance. This step finalizes the synchronization of data from MongoDB to PostgreSQL.
Using goroutines and channels, the parser can effectively handle the high volume of data while ensuring data integrity and achieving concurrency, making the synchronization process efficient and scalable.
Β 
  • Use of buffered Channels and Evaluate their Sizes: Did you notice any performance improvements using buffered channels instead of unbuffered ones for communication between goroutines? Evaluate the buffer sizes for channels (oplogChan, databaseChan, collectionChan, sqlChan, etc.). Make sure they are chosen based on application requirements and system resources.
  • Resource Cleanup: It is important to perform proper cleanup and closure of resources such as MongoDB and PostgreSQL connections, as well as channels when the program finishes. Utilizing "defer" statements can be beneficial to ensure resources are correctly released.
  • Indicate Sender and Receiver Channels: To make the code more readable and self-explanatory, clearly indicate whether a channel is being used for sending or receiving values.
  • Concurrency Safety: Ensure no data races or race conditions exist in the code. Use synchronization mechanisms like Mutex to ensure safe concurrent access to shared resources.
  • Resource Consumption: Evaluate the resource consumption of the application, especially the number of goroutines and memory usage. Ensure it is within acceptable limits. See if you could make a use of https://pkg.go.dev/net/http/pprof package.
  • Graceful Shutdown: Verify that the application handles graceful shutdowns correctly, especially when receiving cancellation signals.