How large is ok for a Class?

https://softwareengineering.stackexchange.com/questions/11846

16-10-2019
|

문제

I’m a long time developer (I’m 49) but rather new to object oriented development. I’ve been reading about OO since Bertrand Meyer’s Eiffel, but have done really little OO programming.

The point is every book on OO design starts with an example of a boat, car or whatever common object we use very often, and they start adding attributes and methods, and explaining how they model the state of the object and what can be done with it.

So they usually go something like "the better the model the better it represents the object in the application and the better it all comes out".

So far so good, but, on the other hand, I’ve found several authors that give recipes such as “a class should fit in just a single page” (I would add “on what monitor size?" now that we try not to print code!).

Take for example a PurchaseOrder class, that has a finite state machine controlling its behavior and a collection of PurchaseOrderItem, one of the arguments here at work is that we should use a PurchaseOrder simple class, with some methods (little more than a data class), and have a PurchaseOrderFSM “expert class” that handles the finite state machine for the PurchaseOrder.

I would say that falls in the “Feature Envy” or “Inappropriate Intimacy” classification of Jeff Atwood's Code Smells post on Coding Horror. I’d just call it common sense. If I can issue, approve or cancel my real purchase order, then the PurchaseOrder class should have issuePO, approvePO and cancelPO methods.

Doesn’t that goes with the “maximize cohesion” and “minimize coupling” age old principles that I understand as cornerstones of OO?

Besides, doesn’t that helps toward the maintainability of the class?

해결책

A class should use the Single Responsibility Principle. Most very large classes I have seen do to many things which is why they are too large. Look at each method and code decide should it be in this class or separate, duplicate code is a hint. You might have an issuePO method but does it contain 10 lines of data access code for example? That code probably shouldn't be there.

다른 팁

In my opinion class size doesn't matter as long as the variables, functions and methods are relevant to the class in question.

The biggest problem with large classes is when it comes to testing those classes, or their interaction with other classes. Large class size by itself isn't a problem, but like all smells, it indicates a potential problem. Generally speaking, when the class is large, that's an indication that the class is doing more than a single class should.

For your car analogy, if class car is too large, that's probably an indication that it needs to be split into class engine, class doors, and class windshields, etc.

I don't think there is a specific definition of a class being too large. However, I would use the following guidelines to determine whether I should refactor my class or redefine the scope of the class:

Are there many variables that are independent of each other? (My personal tolerance is around 10~20 in most cases)
Are there many methods designed for corner cases? (My personal tolerance is ... well, it highly depends on my mood I guess)

In regards to 1, imagine that you define your windshield size, wheel size, tail light bulb model, and 100 other details in your class car . Although they are all "relevant" to the car, it is very difficult to trace the behavior of such class car. And when someday your colleague accidentally (or, in some case, intentionally) changes the windshield size based on the tail light bulb model, it takes you forever to find out why the windshield no longer fits into your car.

In regards to 2, imagine that you try to define all the behaviors a car may have in the class car, but car::open_roof() will be only available for convertibles, and car::open_rear_door() does not apply to those 2-door cars. Although convertibles and 2-door cars are "cars" by definition, implementing them in the class car complicates the class. The class becomes harder to use and harder to maintain.

Other than these 2 guidelines, I would suggest a clear definition of the class scope. Once you define the scope, implement the class strictly based on the definition. Most of the time, people get a "too large" class because of the poor definition of the scope, or because of the arbitrary features added to the class

Say what you need in 300 lines

Note: this rule of thumb assumes you're following the 'one class per file' approach which is in its own right a good maintainability idea

It's not an absolute rule, but if I see over 200 lines of code in one class I'm suspicious. 300 lines and warning bells come on. When I open up someone else's code and find 2000 lines, I know it's refactoring time without even reading the first page.

Mostly this comes down to maintainability. If you object is so complex that you can't express its behavior in 300 lines, it's going to be very difficult for someone else to understand.

I'm finding that I'm bending this rule in the Model -> ViewModel -> View world where I am creating a 'behavior object' for a UI page because it often takes more than 300 lines to describe the complete series of behaviors on a page. However I'm still new to this programming model and finding good ways of refactoring/reusing behaviors between pages/viewmodels which is gradually reducing the size of my ViewModels.

Let's get at it backward:

Doesn’t that goes with the “maximize cohesion” and “minimize coupling” age old principles that I understand as cornerstones of OO?

Actually, it's a cornerstone of programming, of which OO is but one paradigm.

I'll let Antoine de Saint-Exupéry answer your question (cause I am lazy ;) ):

Perfection is achieved, not when there is nothing more to add, but when there is nothing left to take away.

The simpler the class, the more confident you'll be it's right, the easier it'll be to fully test it.

I'm with the OP on the Feature Envy classification. Nothing irritates me more than having a bunch of classes with a bunch of methods that work on some other class's internals. OO suggests that you model data and the operations on that data. That doesn't mean that you put the operations in a different place from the data! It's trying to get you to put the data and those methods together.

With the car and person models so prevalent, it'd be like putting the code for making the electrical connection, or even spinning the starter, into the person class. I would hope that this would be obviously wrong to any experienced programmer.

I don't really have an ideal class or method size, but I prefer to keep my methods and functions fitting on one screen so that I can see their entirety all in one place. (I have Vim configured to open a 60-line window, which happens to be right around the number of lines on a printed page.) There are places where this doesn't make a lot of sense, but it works for me. I like to follow the same guideline for class definitions, and try to keep my files under a few "pages" long. Anything over 300, 400 lines starts to feel unwieldy (mostly because the class has started taking on too many direct responsibilities).

When a class definition has a couple of hundred methods, it is too big.

I fully agree with the use of the single responsibility principle for classes but if that's followed and results in large classes then so be it. The real focus should be that individual methods and properties are not too large, cyclomatic complexity should be the guide there and following that may result in many private methods that will increase the overall class size but will leave it readable and testable to a granular level. If the code remains readable then size is irrelevant.

Issuing a purchase order is often a complicated process that involves more than just the purchase order. In this case, keeping the business logic in a separate class and making the purchase order simpler is probably the right thing to do. In general, though, you're right -- you don't want "data structure" classes. For example, your "POManager" business class' issue() method should probably call the PO's issue() method. The PO can then set its status to "Issued" and note the issue date, while the POManager can then send a notification to accounts payable to let them know to expect an invoice, and another notification to inventory so they know there's something coming in and the expected date.

Anybody who pushes "rules" for method/class size obviously hasn't spent enough time in the real world. As others have mentioned, it's often a refactoring indicator, but sometimes (especially in "data processing"-type work) you really do need to write a class with a single method that spans 500+ lines. Don't get hung up on it.

I’ve found several authors that give recipes such as “a class should fit in just a single page”

In terms of just the physical size of a class, seeing a large class is an indication of a code smell, but doesn't necessarily mean that there always are smells. (Usually they are though) As the pirates in the Pirates of the Caribbean would put it, this is more what you'd call "guidelines" than actual rules. I'd tried strictly following the "no more than one hundred LOC in a class" once and the result was a LOT of unnecessary over-engineering.

So they usually go something like "the better the model the better it represents the object in the application and the better it all comes out".

I usually start my designs with this. What does this class do in the real world? How should my class reflect this object as stated in its requirements? We add a bit of state here, a field there, a method to work on those fields, add a few more and voila! We've got a working class. Now fill in the signatures with the proper implementations and we're good to go, or so you think. Now we've got nice big class with all the functionality we need. But the problem with this is, it doesn't take the way the real world works into account. And by that I mean it doesn't take into account, "CHANGE".

The reason why classes are preferably split into smaller parts is that it's hard to change large classes later on without affecting the existing logic or introducing a new bug or two unintentionally or just make everything hard to reuse. This is where refactoring, design patterns, SOLID, etc come in to play and the end result is usually a small class working on other smaller, more granular sub classes.

Also, in your example, adding IssuePO, ApprovePO and Cancel PO into the PurchaseOrder class seems illogical. Purchase Orders don't issue, approve or cancel themselves. If PO should have methods, then the methods should be working on its state not on the class as a whole. They really are just data/record classes that other classes work on.

Large enough to contain all the necessary logic to perform it's job.

Small enough to be maintainable and follow the Single Responsibility Principle.

라이센스 : CC-BY-SA ~와 함께 속성

제휴하지 않습니다 softwareengineering.stackexchange