Question

TLDR; I'm looking for ideas on how to flag code containing file names/paths that have inconsistent capitalisation with the actual file/directory.

Situation

I am migrating a significant code base written in an interpreted language from a Windows/OSX dev environment and a Windows prod environment to Linux prod and dev (via vagrant) environment.

The problem is that I've discovered over the years various developers have been sloppy in ensuring references to files have consistent capitalisation with the actual file.

This is not just an issue for including other code files, but also in referencing template names and auto-loading classes from their name. For example someone might render a template by referencing mytemplate but the actual file is called myTemplate.html, or they reference MyClass but the file is called Myclass.

Previously the developers have failed to notice these instances because they're using case-insensitive file systems that hide the issue. Now that the production environment is case-sensitive a heap usually obscure bugs have arisen caused by case inconsistencies.

Questions

  1. What is the best way to identify possible instances of this via static analysis of the code.

I've realised that dealing with this reactively (ie. in response to bug reports) is insufficient - I need to find all the instances where this is possibly an issue and manually review it, but finding every instance is painstaking. I was hoping that there might be an existing static analysis tool for this kind of issue but I couldn't find one.

I was thinking of maybe putting something together that would list all file and directory names, then search the code for any references to these but with a different case. This wouldn't be particularly intelligent but it would probably work (although there would probably be a significant number of false positives from variables names, etc. to work through).

  1. How do I prevent this from happening in future

While developers are now using a Linux dev environment, vagrant shared folders are case-insensitive (on Windows systems at least). In some instances I've been able to add "fast-fail" conditions (so it explicitly checks the case and fails if they don't match, even if the file operation is successful), but I can't do that where file paths are passed directly into a native function for example. Unit testing on a prod environment is one solution, but some of these issues occur within templates (eg. somefile.js vs. someFile.js) which is hard to unit test.

Was it helpful?

Solution

This is a task that defies static analysis. For example the not-unlikely code

def open_me(dir_path, filename, extension):
    return open(dir_path + filename + extension)

would be a tough one to catch. When faced with an insoluble problem, use heuristics. I'd go both directions with this.

  1. Assume that every filename in the tree is a candidate for being referenced in every file. I'd also add to the candidates someFile for every someFile.js. Then search the entire codebase for instances of the candidates and report them as possible defects which must be looked at. I recommend putting eyeballs in the loop because you don't want to catch comments and string literals which may not be filenames. Tedious? Sure.
  2. The second direction is to look at every open() or similar function for filename-like things. This is tedious too.

There is only one way to prevent it happening in the future:

  1. Set a policy and enforce it through code reviews. You went to a new platform which has additional portability constraints. Because of the open_me gotcha above, additional discipline is needed by the development team.
Licensed under: CC-BY-SA with attribution
scroll top