Skip to content Skip to sidebar Skip to footer

Python Current.futures Import Libraries Multiple Times (execute Code In Top Scope Multiple Times)

for the following script (python 3.6, windows anaconda), I noticed that the libraries are imported as many as the number of the processors were invoked. And print('Hello') are also

Solution 1:

concurrent.futures.ProcessPoolExecutor uses the multiprocessing module to do its multiprocessing.

And, as explained in the Programming guidelines, this means you have to protect any top-level code you don't want to run in every process in your __main__ block:

Make sure that the main module can be safely imported by a new Python interpreter without causing unintended side effects (such a starting a new process).

... one should protect the “entry point” of the program by using if __name__ == '__main__':

Notice that this is only necessary if using the spawn or forkserverstart methods. But if you're on Windows, spawn is the default. And, at any rate, it never hurts to do this, and usually makes the code clearer, so it's worth doing anyway.

You probably don't want to protect your imports this way. After all, the cost of calling import pandas as pd once per core may seem nontrivial, but that only happens at startup, and the cost of running a heavy CPU-bound function millions of times will completely swamp it. (If not, you probably didn't want to use multiprocessing in the first place…) And usually, the same goes for your def and class statements (especially if they're not capturing any closure variables or anything). It's only setup code that's incorrect to run multiple times (like that print('hello') in your example) that needs to be protected.


The examples in the concurrent.futures doc (and in PEP 3148) all handle this by using the "main function" idiom:

defmain():
    # all of your top-level code goes hereif __name__ == '__main__':
    main()

This has the added benefit of turning your top-level globals into locals, to make sure you don't accidentally share them (which can especially be a problem with multiprocessing, where they get actually shared with fork, but copied with spawn, so the same code may work when testing on one platform, but then fail when deployed on the other).


If you want to know why this happens:

With the fork start method, multiprocessing creates each new child process by cloning the parent Python interpreter and then just starting the pool-servicing function up right where you (or concurrent.futures) created the pool. So, top-level code doesn't get re-run.

With the spawn start method, multiprocessing creates each new child process by starting a clean new Python interpreter, importing your code, and then starting the pool-servicing function. So, top-level code gets re-run as part of the import.

Post a Comment for "Python Current.futures Import Libraries Multiple Times (execute Code In Top Scope Multiple Times)"