가져 오기 명령문은 항상 모듈의 맨 위에 있어야합니까?

https://stackoverflow.com/questions/128478

02-07-2019
|

문제

PEP 08 상태 :

가져 오기는 항상 모듈 댓글과 문서화 직후, 모듈 글로벌 및 상수 전에 파일의 상단에 배치됩니다.

그러나 가져 오는 클래스/메소드/기능이 드문 경우에만 사용되는 경우 필요할 때 가져 오기가 더 효율적입니까?

이것은 그렇지 않다 :

class SomeClass(object):

    def not_often_called(self)
        from datetime import datetime
        self.datetime = datetime.now()

이것보다 더 효율적입니까?

from datetime import datetime

class SomeClass(object):

    def not_often_called(self)
        self.datetime = datetime.now()

해결책

모듈 가져 오기는 상당히 빠르지 만 즉각적인 것은 아닙니다. 이것은 다음을 의미합니다.

수입품을 모듈 상단에 두는 것은 괜찮습니다. 한 번만 지불하는 사소한 비용이기 때문입니다.
기능 내에 가져 오면 해당 기능에 대한 호출이 더 오래 걸립니다.

따라서 효율성에 관심이 있으시면 수입품을 최상위에 올리십시오. 프로파일 링이 도움이 될 수있는 경우 (귀하의 했다 성능을 향상시키는 가장 좋은 위치를 보는 프로필, 맞습니다.)

게으른 수입을 수행하는 가장 좋은 이유는 다음과 같습니다.

선택적 라이브러리 지원. 코드에 다른 라이브러리를 사용하는 여러 경로가있는 경우 옵션 라이브러리가 설치되어 있지 않은 경우 중단하지 마십시오.
에서 __init__.py 가져올 수 있지만 실제로 사용되지 않는 플러그인의. 예를 들어 바자 플러그인이 사용됩니다 bzrlib게으른로드 프레임 워크.

다른 팁

기능 내부에 가져 오면 원형 의존성을 방지 할 수 있습니다. 예를 들어, 2 개의 모듈 인 X.py 및 y.py가있는 경우 서로 가져와야하는 경우 무한 루프를 유발하는 모듈 중 하나를 가져 오면 원형 의존성이 발생합니다. 모듈 중 하나에서 가져 오기 명령문을 이동하면 함수가 호출 될 때까지 다른 모듈을 가져 오려고하지 않고 해당 모듈이 이미 가져 오므로 무한 루프가 없습니다. 자세한 내용은 여기에서 읽으십시오. effbot.org/zone/import-confusion.htm

나는 모든 가져 오기를 모듈의 상단보다는 그 기능에 사용하는 관행을 채택했다.

내가 얻는 이점은 더 안정적으로 리팩터링하는 능력입니다. 한 모듈에서 다른 모듈로 기능을 이동하면 기능이 테스트의 모든 레거시와 계속 작동한다는 것을 알고 있습니다. 모듈 상단에 가져 오기가 있으면 함수를 움직일 때 새로운 모듈의 가져 오기를 완료하고 최소화하는 데 많은 시간을 소비하게됩니다. 리팩토링 IDE는 이것을 무의미하게 만들 수 있습니다.

다른 곳에서 언급했듯이 스피드 페널티가 있습니다. 나는 내 응용 프로그램에서 이것을 측정했으며 내 목적에 대해 중요하지 않다는 것을 알았습니다.

검색에 의지하지 않고 모든 모듈 종속성을 미리 볼 수있어서 기쁘다 (예 : GREP). 그러나 모듈 종속성에 관심이있는 이유는 일반적으로 단일 모듈이 아닌 여러 파일로 구성된 전체 시스템을 설치, 리팩토링 또는 이동하기 때문입니다. 이 경우 시스템 수준 종속성이 있는지 확인하기 위해 글로벌 검색을 수행 할 것입니다. 따라서 실제로 시스템에 대한 이해를 돕기 위해 글로벌 수입을 찾지 못했습니다.

나는 보통의 수입을 넣었다 sys 내부 if __name__=='__main__' 인수를 확인한 다음 통과하십시오 (: sys.argv[1:]) a main() 기능. 이것은 내가 사용할 수있게한다 main 상황에서 sys 수입되지 않았습니다.

대부분의 경우 이것은 명확성에 유용하고 합리적이지만 항상 그런 것은 아닙니다. 다음은 모듈 수입이 다른 곳에 살 수있는 상황의 몇 가지 예입니다.

첫째, 양식의 단위 테스트가 포함 된 모듈을 가질 수 있습니다.

if __name__ == '__main__':
    import foo
    aa = foo.xyz()         # initiate something for the test

둘째, 런타임에 다른 모듈을 조건부로 가져와야 할 필요가있을 수 있습니다.

if [condition]:
    import foo as plugin_api
else:
    import bar as plugin_api
xx = plugin_api.Plugin()
[...]

코드의 다른 부분에 가져올 수있는 다른 상황이있을 수 있습니다.

첫 번째 변형은 함수를 0 또는 한 번이라고 할 때 실제로 두 번째 변형보다 더 효율적입니다. 그러나 두 번째 및 후속 호출로 인해 "모든 통화 가져 오기"접근 방식은 실제로 덜 효율적입니다. 보다 이 링크 "게으른 수입"을 수행하여 두 가지 접근 방식의 최고를 결합한 게으른로드 기술의 경우.

그러나 효율성 이외의 이유가 다른 이유가 있습니다. 한 가지 방법은이 모듈의 종속성에 대해 코드를 읽는 사람에게 훨씬 더 명확하게 만드는 것입니다. 또한 실패 특성이 매우 다릅니다. "DateTime"모듈이 없으면 첫 번째는 부하 시간에 실패하고 두 번째는 메소드가 호출 될 때까지 실패하지 않습니다.

추가 참고 : Ironpython에서는 코드가 기본적으로 가져 오기 때문에 수입이 Cpython보다 훨씬 비쌀 수 있습니다.

Curt는 좋은 지적을합니다. 두 번째 버전은 더 명확하고 나중에 가하가 아닌로드 타임에 실패합니다.

일반적으로 나는 (a) 매우 빠르기 때문에 로딩 모듈의 효율성에 대해 걱정하지 않습니다. (b) 대부분 시작시 발생합니다.

예기치 않은 시간에 헤비급 모듈을로드해야한다면 아마도 동적으로로드하는 것이 더 합리적 일 것입니다. __import__ 기능하고 BE 확실한 잡기 위해 ImportError 예외를 제외하고 합리적인 방식으로 처리합니다.

모듈을 너무 많이로드하는 효율성에 대해 걱정하지 않을 것입니다. 모듈에 의해 취한 메모리는 그다지 크지 않으며 (모듈이 충분하다고 가정) 시작 비용은 무시할 수 있습니다.

대부분의 경우 소스 파일 상단에 모듈을로드하려고합니다. 코드를 읽는 사람의 경우 어떤 모듈에서 어떤 기능이나 객체가 왔는지 쉽게 알 수 있습니다.

코드의 다른 곳에서 모듈을 가져 오는 한 가지 이유 중 하나는 디버깅 문에 사용되는 경우입니다.

예를 들어:

do_something_with_x(x)

이것을 디버깅 할 수 있습니다.

from pprint import pprint
pprint(x)
do_something_with_x(x)

물론, 코드의 다른 곳에서 모듈을 동적으로 가져와야하는 경우 코드의 다른 곳에서 모듈을 가져 오는 또 다른 이유. 이것은 당신이 선택의 여지가 거의 없기 때문입니다.

프로그래머만이 결정할 수있는 트레이드 오프입니다.

CASE 1은 필요할 때까지 DateTime 모듈을 가져 오지 않고 (필요할 수있는 모든 초기화를 수행하지 않음) 메모리 및 시작 시간을 저장합니다. '호출 할 때마다'호출 할 때만 '수입을 수행한다는 것은 첫 번째 호출 후 각 호출이 여전히 가져 오기의 추가 오버 헤드가 발생하고 있습니다.

CASE 2 DATETIME을 미리 가져와 NOT_OFTEN_CALLED ()가 더 빨리 반환되도록 DateTime을 가져 와서 일부 실행 시간과 대기 시간을 저장합니다. ~이다 전화를 걸고 모든 통화에서 가져 오기의 오버 헤드를 발생시키지 않습니다.

효율성 외에도 가져 오기 명령문이 미리라면 모듈 종속성을 앞쪽으로 볼 수 있습니다. 코드에 숨기면 모듈이 의존하는 모듈을 쉽게 찾기가 더 어려워 질 수 있습니다.

개인적으로 나는 일반적으로 단위 테스트와 같은 것을 제외하고는 PEP를 따릅니다. 알다 테스트 코드를 제외하고는 사용되지 않습니다.

다음은 모든 수입품이 최상위에있는 예입니다 (이것은 내가해야 할 유일한 시간입니다). UN*X와 Windows에서 하위 프로세스를 종료 할 수 있기를 원합니다.

import os
# ...
try:
    kill = os.kill  # will raise AttributeError on Windows
    from signal import SIGTERM
    def terminate(process):
        kill(process.pid, SIGTERM)
except (AttributeError, ImportError):
    try:
        from win32api import TerminateProcess  # use win32api if available
        def terminate(process):
            TerminateProcess(int(process._handle), -1)
    except ImportError:
        def terminate(process):
            raise NotImplementedError  # define a dummy function

(검토 : 뭐 존 밀리 킨 말했다.)

이것은 다른 많은 최적화와 같습니다. 속도에 대한 가독성을 희생합니다. John이 언급했듯이 프로파일 링 숙제를했고 이것이 충분히 유용하다는 것을 알았다면 그리고 여분의 속도가 필요하다. 다른 모든 수입품과 메모를하는 것이 좋을 것입니다.

from foo import bar
from baz import qux
# Note: datetime is imported in SomeClass below

Module initialization only occurs once - on the first import. If the module in question is from the standard library, then you will likely import it from other modules in your program as well. For a module as prevalent as datetime, it is also likely a dependency for a slew of other standard libraries. The import statement would cost very little then since the module intialization would have happened already. All it is doing at this point is binding the existing module object to the local scope.

Couple that information with the argument for readability and I would say that it is best to have the import statement at module scope.

Just to complete Moe's answer and the original question:

When we have to deal with circular dependences we can do some "tricks". Assuming we're working with modules a.py and b.py that contain x() and b y(), respectively. Then:

We can move one of the from imports at the bottom of the module.
We can move one of the from imports inside the function or method that is actually requiring the import (this isn't always possible, as you may use it from several places).
We can change one of the two from imports to be an import that looks like: import a

So, to conclude. If you aren't dealing with circular dependencies and doing some kind of trick to avoid them, then it's better to put all your imports at the top because of the reasons already explained in other answers to this question. And please, when doing this "tricks" include a comment, it's always welcome! :)

In addition to the excellent answers already given, it's worth noting that the placement of imports is not merely a matter of style. Sometimes a module has implicit dependencies that need to be imported or initialized first, and a top-level import could lead to violations of the required order of execution.

This issue often comes up in Apache Spark's Python API, where you need to initialize the SparkContext before importing any pyspark packages or modules. It's best to place pyspark imports in a scope where the SparkContext is guaranteed to be available.

I do not aspire to provide complete answer, because others have already done this very well. I just want to mention one use case when I find especially useful to import modules inside functions. My application uses python packages and modules stored in certain location as plugins. During application startup, the application walks through all the modules in the location and imports them, then it looks inside the modules and if it finds some mounting points for the plugins (in my case it is a subclass of a certain base class having a unique ID) it registers them. The number of plugins is large (now dozens, but maybe hundreds in the future) and each of them is used quite rarely. Having imports of third party libraries at the top of my plugin modules was a bit penalty during application startup. Especially some thirdparty libraries are heavy to import (e.g. import of plotly even tries to connect to internet and download something which was adding about one second to startup). By optimizing imports (calling them only in the functions where they are used) in the plugins I managed to shrink the startup from 10 seconds to some 2 seconds. That is a big difference for my users.

So my answer is no, do not always put the imports at the top of your modules.

I was surprised not to see actual cost numbers for the repeated load-checks posted already, although there are many good explanations of what to expect.

If you import at the top, you take the load hit no matter what. That's pretty small, but commonly in the milliseconds, not nanoseconds.

If you import within a function(s), then you only take the hit for loading if and when one of those functions is first called. As many have pointed out, if that doesn't happen at all, you save the load time. But if the function(s) get called a lot, you take a repeated though much smaller hit (for checking that it has been loaded; not for actually re-loading). On the other hand, as @aaronasterling pointed out you also save a little because importing within a function lets the function use slightly-faster local variable lookups to identify the name later (http://stackoverflow.com/questions/477096/python-import-coding-style/4789963#4789963).

Here are the results of a simple test that imports a few things from inside a function. The times reported (in Python 2.7.14 on a 2.3 GHz Intel Core i7) are shown below (the 2nd call taking more than later calls seems consistent, though I don't know why).

 0 foo:   14429.0924 µs
 1 foo:      63.8962 µs
 2 foo:      10.0136 µs
 3 foo:       7.1526 µs
 4 foo:       7.8678 µs
 0 bar:       9.0599 µs
 1 bar:       6.9141 µs
 2 bar:       7.1526 µs
 3 bar:       7.8678 µs
 4 bar:       7.1526 µs

The code:

from __future__ import print_function
from time import time

def foo():
    import collections
    import re
    import string
    import math
    import subprocess
    return

def bar():
    import collections
    import re
    import string
    import math
    import subprocess
    return

t0 = time()
for i in xrange(5):
    foo()
    t1 = time()
    print("    %2d foo: %12.4f \xC2\xB5s" % (i, (t1-t0)*1E6))
    t0 = t1
for i in xrange(5):
    bar()
    t1 = time()
    print("    %2d bar: %12.4f \xC2\xB5s" % (i, (t1-t0)*1E6))
    t0 = t1

It's interesting that not a single answer mentioned parallel processing so far, where it might be REQUIRED that the imports are in the function, when the serialized function code is what is being pushed around to other cores, e.g. like in the case of ipyparallel.

There can be a performance gain by importing variables/local scoping inside of a function. This depends on the usage of the imported thing inside the function. If you are looping many times and accessing a module global object, importing it as local can help.

test.py

X=10
Y=11
Z=12
def add(i):
  i = i + 10

runlocal.py

from test import add, X, Y, Z

    def callme():
      x=X
      y=Y
      z=Z
      ladd=add 
      for i  in range(100000000):
        ladd(i)
        x+y+z

    callme()

run.py

from test import add, X, Y, Z

def callme():
  for i in range(100000000):
    add(i)
    X+Y+Z

callme()

A time on Linux shows a small gain

/usr/bin/time -f "\t%E real,\t%U user,\t%S sys" python run.py 
    0:17.80 real,   17.77 user, 0.01 sys
/tmp/test$ /usr/bin/time -f "\t%E real,\t%U user,\t%S sys" python runlocal.py 
    0:14.23 real,   14.22 user, 0.01 sys

real is wall clock. user is time in program. sys is time for system calls.

https://docs.python.org/3.5/reference/executionmodel.html#resolution-of-names

I would like to mention a usecase of mine, very similar to those mentioned by @John Millikin and @V.K. :

Optional Imports

I do data analysis with Jupyter Notebook, and I use the same IPython notebook as a template for all analyses. In some occasions, I need to import Tensorflow to do some quick model runs, but sometimes I work in places where tensorflow isn't set up / is slow to import. In those cases, I encapsulate my Tensorflow-dependent operations in a helper function, import tensorflow inside that function, and bind it to a button.

This way, I could do "restart-and-run-all" without having to wait for the import, or having to resume the rest of the cells when it fails.

라이센스 : CC-BY-SA ~와 함께 속성

제휴하지 않습니다 StackOverflow