Question

At my place of employment I have noticed a weird thing where developers will include largish libraries to do simple things. To be clear we're a Scala shop. Here are two examples that occured :

1) In one project we do CSV parsing which was originally a 3 line function. Due to the CSV creator not consistently generating the correct format this grew to cover various corner cases. A junior developer decided to bring in scala-csv for its description of CSV formats and for an Exception class. He did not integrate the parsing of CSV to use the library until prompted. Because having two ways of parsing CSv in a system is stupid we opted to use the library.

2) A senior developer bought in a scala wrapper for Joda time for its ordering ability. This was used in one isolated case. When prompted to either remove or integrate the library fully the only excuse given was "well we might need it later". Eventually the library was removed.

While both stories are different they highlight a strange mentality of just including whole libraries for utility functionality. In both cases I had to step in and insist we either integrate with the library and use its core functionality or remove it.

My question is if my insistence on either using the functionality a library offers fully or not including it / rolling your own a bad approach? My concern is that carrying a library around for utility functionality just adds dependency problems for no benefit. For example does the version of Joda the wrapper library use match our own?

Was it helpful?

Solution

Libraries are about leverage. In your first case, correctly parsing CSV, for all the many, many different (mis)interpretations of CSV has already been solved in the library. It is a perfectly good thing to do this the right way. Your software is better, and smaller for it.

The primary expense of intellectual property such as computer software is the cost of maintaining and extending it. You are trying to compare the cost of three lines of software that handles some constrained data sources versus a library, that requires one line of build.sbt code, that handles almost all cases of CSV.

The real cost of those three lines is "well, I have to pre-check the data to see that properly escapes embedded newlines, and make sure the file didn't come from Joe, who embeds commas, and ..."

Edit: Regarding nose-in-the-door library inclusion:

Developers will often include libraries for no other good reason than that there is a piece of useful functionality that they can use right here and right now. There are a number of libraries that offer utility in a certain area.

  • Using only, let's say, the range-checking functionality of Joda Time/Scala might be comparable to using a multimeter as a small hammer. It works, but it might be missing the point. On the other hand, maybe you want to migrate more of your environment to use the much superior time handling of that library. This is an organization wide conversation that you need to have.
  • A much worse situation arises when a number of different libraries get included with substantially overlapping functionality. My favorite is the string checking functionality available in Apache Commons, Guava, and half a dozen other compilations. One programmer at a time Googles for function that does a particular job, and each selects a different library. This is an even more important organization wide conversation.

This whole question is strongly related to sharing classes and algorithms across your organization. It requires conversations to find out requirements, analysis of duplications and, more useful, analysis of gaps.

Licensed under: CC-BY-SA with attribution
scroll top