Question

Edit: Since this question was asked a lot of improvement has happened in the standard Python scientific libraries (which was the targeted area). For example the numpy project has made a big effort to improve the docstrings. One can still argue if it would have been possible to address these issues continuously right from the start.


I have this somewhat heretic question: Why do so many Python libraries have messy code and don't follow standard best practices? Or do you think that this observation is absolutely not true? How does the situation compare to other languages? I am interested in your take on this.

Some reasons why I have the impression that quality is lacking:

  • The docstrings are often completely missing or incomplete, even for the public API. It is painful when a method takes *args and **kwargs but does not document which values can be given.

  • Bad Python coding practices, like adding new attributes outside of __init__. Things like this make the code hard to read (or to maintain).

  • Hardly any libraries follow the PEP8 coding conventions. Sometimes the conventions are not even consistent in a single file.

  • The overall design is messy, with no clear API. It seems that not nearly enough refactoring is done.

  • Poor unittest coverage.

Don't get me wrong, I absolutely love Python and its ecosystem. And even though I struggled with these libraries they generally get the job done and I am grateful for that. But I also think that in the end tons of developer time are wasted because of these issues. Maybe that is because Python gives you so much freedom that it is very easy to write bad code.

Was it helpful?

Solution

Regarding documentation, it's not just Python. If there is one single factor that is preventing the wider adoption of OSS it is, IMHO, the truly dreadful level of documentation of most OSS projects. This starts at the code level and extends to the user docs. Can I just say to anyone working on OSS:

a) Comment your code! There is no such thing as self documenting code!

b) Spend at least 25% of the project time budget on end-user documentation.

And I do know vaguely what I'm talking about - I have a couple of OSS projects of my own, I've contributed to several others and I use OSS almost exclusively. And yesterday I spent over 4 hours trying to build a major OSS project (no names, no pack drill), and failing because of the crappy, self-contradictory documentation.

OTHER TIPS

Instead the authors each seem to follow their own glorious convention. And sometimes the conventions are not even consistent with the same file of a library

Welcome to the wonderful code of the real world!

FWIW Python code I have met is no better or worse than that in any other language.

(Well, better than the average PHP project, obviously, but that's not really fair.)

The first thing you need to realize is that Python did not spring, fully formed, from the head of Guido sometime around version 2.x. It's grown over the course of the past twenty years.

In fact, a number of the things you mention (unittest, for example, and PEP-8), didn't even exist when some of the standard libraries were first written.

You'll probably notice that the older the library you're looking at, the more likely they are to have divergences from the current "best practices"--often because they predate widespread adoption of those practices. More recent libraries are more likely to conform to current practices.

Additionally, sometimes there is often a good reason for not bringing them up to date. Imagine you have several tens of thousands of lines of code written against the current Python libraries. Now, the maintainer of one of those libraries decides to change the libraries to make the class and function names conform to PEP-8. Now everyone who has working code has to revisit huge amounts of it, lest the renaming break things.

That's not to say there aren't things that can improve in Python libraries--there are! But there's always a trade-off between perfection and getting things done. That's one reason they say "Practicality beats purity."

This is because Python is not backed up by the corporate world like Java or .Net .

If I want my Java library to be promoted by Sun I will follow their guidelines. This is not the case with Python. I write my code, people find it better and it has to evolve on its own.

Also most Python developers are from C++, C, Java,.Net etc. And they start writing production code right from the first day. Thanks to easiness of Python. And the vicious cycle continues.

Even it took me a month to come to PEP8 and refactor my code.

PEP 8 has changed over time. Some modules follow older recommendations. You can see that with PIL, which uses modules like "Image" where the module contains a single main class, instead of the recommended lowercase for module names, and in C extensions which use the "c" prefix, rather than the more modern "_" prefix.

Some of the libraries are developed by people who are strongly influenced by traditions in other fields, like Java and C++. These people more often use CamelCase instead of the PEP 8 recommended lowercase_with_underscores.

The answers here wouldn't be complete without reference to Sturgeon's Law: "Ninety percent of everything is crap."

Ninety percent of [python libraries] are crud, but ninety percent of everything is crud

-- Sturgeons law (paraphrased)

It sounds like you have come to find that code quality does not meet the expectations you were set up to expect. Perhaps from school, or best practices books or senior developers.

After having worked at several companies, I found myself regularly advised to do unit tests, document code, use version/source control (all good advice that I have taken) then finding that the givers of that advice rarely follow the advice themselves.

I would say that you do have the right impression that sometimes the code quality is low, but only based on your expectations. Certainly numpy and others are quite useful packages, even if not coded to the standard you were set up to expect.

Standards are opinions, and if you are of the opinion that standards are low, then you can try to help make those standards better by contributing, or accept them as they are and be sure to write code that serves as an example to the juniors you will find yourself in charge of one day.

As for matplotlib, there is project to improve it's "pythoness" - http://www.scipy.org/PyLab

The thing about scientific libraries, is that they are written by scientist, no by professional software developers. Moveover, those scientist are used to write Fortran. The question is -- would you rather have working code or beautiful code?

PEP8 is a style guide, not a style requirement. It even states that you should "know when to be inconsistent". Personally, I don't like some of the guidelines in it. I prefer camelCase to snake_case since I'm more used to that.

And I don't often look at the source code of libraries I'm using unless it's absolutely necessary (such as debugging). I'd much prefer the documentation for said library to be adequate enough that I never have to look at the source.

Seriously, why do you really care what the source code for matplotlib looks like, as long as it does what it is intended to do?

I agree with you on the missing docstrings (assuming they're public elements rather than internal ones) but that's not the only way to document a library.

I believe that Python suffers from being hoisted too eagerly on people who are not programmers (by schooling or trade) as a solution for "need some programming done? Here, try this easy and mature tool".

Similarly to how PHP became such a huge success and with so many libraries with abysmal code quality (even if, granted, the average Python code quality is better then for PHP) - the average PHP user similarly to the average Python user has not much programming experience or skills and very little incentive to improve themselves in this regard - they set out to achieve something, and maybe they thought it is worthy enough to be shared with the community in the form of a library, but most often once the job is done they have no interest to better the code or better themselves (in programming skills, I mean).

The solution might be for Python library repositories (such as PyPI) to have stricter rules about accepting contributed packages - handle this with a review process whose purpose is to ensure quality - the same way that major Linux distributions have a review process before adding a package to their repositories.

PEP8 is just that, a convention, not a requirement. It would be really sad if all python programmers had to adhere to a common set of rules, we lose enthusiasm over the slightest of issues.

As far as missing docstrings are concerned - yes, they can help when using interactive help - but I generally don't mind as long as there's some documentation. I try not to read the source code of the libraries I use, I tend to start modifying (rewriting) them.

Regarding comparison with other languages, I think that language design plays a big part here. For example, in a strong-typed language like Java, even if the library is missing good documentation, you can still deduce much of the functionality from the method signatures. No *args to contend with.

How about a collection of examples of good software doc ?
Good examples might lead to overall improvement a bit faster than random walk.
The collection could be split into categories such as:
inline doc / help page / tutorial / reference manual, web page / paper, pictures / none.
Each entry should have a few words on why the reviewer finds it good.
(Where: a corner of stackoverflow ?)

nikow: I can only answer for myself, most of my Python (and PHP or Ruby, all dynamic "scripting" languages) work is done just for me - but I always release it on my personal site if anyone else finds it useful, but I never go through any documentation or QA process because as long as it works for me I'm happy.

Well, they are open source. As such they will also evolve over time, if they are good enough.

That's one of the many beauties of open source.

Often there is little sense in writing lot of documentation and "good" code if you don't know whether the project will live on. That would just be a waste of time.

Edit: Of course writing good code would never hurt the first time around though... But maybe just "getting the job done" is good enough in many cases. I think that otherwise we wouldn't enjoy the vast amount of options when it comes to OSS.

I think that if enough people act a specific way there might be some explanation to it. They are not just randomly doing so to offend you.

Quality of code * number of comments * time = constant

Pick two !

I never had any problem using matplotlib; can't say I looked at the code much - it is a fine library. Does what it is supposed to do (for free !)

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top