Question

I am writing a function that will download an HTML post.

Having downloaded the string I will:

  • strip off the html tags
  • remove special encoding characters like \n
  • remove trailing white spaces

My question is: should I first download all the content and then proceed with the modifications or should I apply the modifications right after the string has been downloaded (and only then download the next string) ?

This question is mostly about architecture and good practices. In this particular case, speed is not an issue because I don't have more than 1000 posts and this procedure will run once a month. The language is Python.

What is the best practice for this case? And what is the best practice in general ?

Was it helpful?

Solution

Sorry, there is no "best practice" telling which of the two ideas is "better in general".

There is only one "best practice" here you can apply:

  • Implement it in the way which is most simple for you - based on your current tool set, your current knowledge and the available existing code base in your system. That should normally give your a maintainable, evolvable, and easy-to debug solution.

  • Only if you encounter an observable resource bottleneck (memory or performance) with the simple approach, try a different, more complicated solution.

If you think both approaches are of equal complexity, flip a coin.

Licensed under: CC-BY-SA with attribution
scroll top