Python Current.futures Import Libraries Multiple Times (execute Code In Top Scope Multiple Times)
Solution 1:
concurrent.futures.ProcessPoolExecutor
uses the multiprocessing
module to do its multiprocessing.
And, as explained in the Programming guidelines, this means you have to protect any top-level code you don't want to run in every process in your __main__
block:
Make sure that the main module can be safely imported by a new Python interpreter without causing unintended side effects (such a starting a new process).
... one should protect the “entry point” of the program by using
if __name__ == '__main__':
…
Notice that this is only necessary if using the spawn
or forkserver
start methods. But if you're on Windows, spawn
is the default. And, at any rate, it never hurts to do this, and usually makes the code clearer, so it's worth doing anyway.
You probably don't want to protect your import
s this way. After all, the cost of calling import pandas as pd
once per core may seem nontrivial, but that only happens at startup, and the cost of running a heavy CPU-bound function millions of times will completely swamp it. (If not, you probably didn't want to use multiprocessing in the first place…) And usually, the same goes for your def
and class
statements (especially if they're not capturing any closure variables or anything). It's only setup code that's incorrect to run multiple times (like that print('hello')
in your example) that needs to be protected.
The examples in the concurrent.futures
doc (and in PEP 3148) all handle this by using the "main function" idiom:
defmain():
# all of your top-level code goes hereif __name__ == '__main__':
main()
This has the added benefit of turning your top-level globals into locals, to make sure you don't accidentally share them (which can especially be a problem with multiprocessing
, where they get actually shared with fork
, but copied with spawn
, so the same code may work when testing on one platform, but then fail when deployed on the other).
If you want to know why this happens:
With the fork
start method, multiprocessing
creates each new child process by cloning the parent Python interpreter and then just starting the pool-servicing function up right where you (or concurrent.futures
) created the pool. So, top-level code doesn't get re-run.
With the spawn
start method, multiprocessing
creates each new child process by starting a clean new Python interpreter, import
ing your code, and then starting the pool-servicing function. So, top-level code gets re-run as part of the import
.
Post a Comment for "Python Current.futures Import Libraries Multiple Times (execute Code In Top Scope Multiple Times)"