The common perception is that Python's implementation is slow, but you can often write fast Python if you know how to profile your code effectively.

I have tried it. I have compared a hight cpu intensive algorithm, the training of a simple one hidden neural network. To do so, I have used my old C++ NeuralNetwork library (flayers) and an implementation in python with Numpy. I have wrote a simple neural net in python and optimize all loops with numpy as suggested in a profiling presentation saw in Pycon2009.

I have compare the training time of a simple fully connected NeuralNetwork will 100 hidden neurones for 10 iteration on letters dataset (cost function = mean square error).

Here is the time to do 10 iteration with flayers (c++):

./fexp / -h 100 -l 0.01 --oh -e 10

...

Optimization: Standard

Creating Connector [16|100] [inputs | hiddens]

Creating Connector [100|26] [hiddens | outputs]

...

real 0m11.187s

user 0m10.837s

sys 0m0.012s

Here is the time to do 10 iteration on the full letters dataset with python:

Here is the time to do 10 iteration on the full letters dataset with python and numpy:timetime ./bpnn.py -e 10 --h 100 -f letters.dat -nCreation of an NN <16:100:26>...

real 85m48.646s

user 85m9.163s

sys 0m1.632s

./bpnn.py -e 10 --h 100 -f letters.datCreation of an NN <16:100:26>...real 1m37.066suser 1m36.026ssys 0m0.100s

So if you do the math:

- The numpy implementation is 60 time faster then a basic python implementation.
- My C++ implementation is a little more then 10 time faster then my simply python numpy implementation.

Numpy implementation definitly worth it because it reduce the code and has a significant performance impact, the C++ might be required for extreme performance but the trade off of code complexity and time my not work it. Now that I have the choice, I will still use my C++ lib.