Fraka6 Blog - No Free Lunch: python

Showing posts with label python. Show all posts

Thursday, October 15, 2015

Summary of deeplearning libs available in python

Core: Theano http://deeplearning.net/tutorial/ (Torch vs Theano)
Framework:

Keras: Theano-based Deep Learning library http://keras.io/
Blocks and Fuel: Frameworks for deep learning (article) mila-udem · GitHub
theanets (numpy+sklearn+theano)

NervanaSystems/neon (build on top of numpy + YAML config file like cafee + leverage nervana cpu)
uaca/deepy · GitHub (build on top of theano)

Saturday, April 4, 2015

How to deploy a python datascience python app on Heroku? numpy+scipy+pandas+sklearn+matplotlib

Résultats de recherche d'images pour « datascience »

Résultats de recherche d'images pour « python »

If you are looking for a way to deploy a datascience python app on heroku, you might have some troubles like:

time out
numpy and scipy incompatibilities

After several interation, here is a script to do it:

https://github.com/fraka6/trading-with-python/blob/master/create_heroku_datascience.sh

Here is what you will get:

Btw, I am currently experimenting my datascience stuff on sense,io (Sense.io is a collaborative platform to accelerate data science from exploration to production.)
Résultats de recherche d'images pour « sense.io »

Résultats de recherche d'images pour « sense.io »

Friday, May 2, 2014

Looking for a simple way to add columns based on the other ones?

Are you looking for a simple way to add columns based on some other ones?
Let's say you want to add the column C where C=A+B?
you can do cat *.tsv | ./coladd.py -a C=A+B

if you do add more fields, separate your equations with ',' like this:
cat *.tsv | ./coladd.py -a C=A+B, D=(A/C)

eval() and csv.DictReader have been leveraged to achieve this task.
code: coladd.py

Monday, July 15, 2013

greping zip/bz2 files is annoying: -H option doesn't work

grep is a quite useful command line but some options don't leave well with zip files....like:
-H, --with-filename print the filename for each match

bzgrep nore zgrep solve the problem, it is making the same effect as:
zcat *.bz2 | grep -H "something"

it generate this
(standard input):

not:
filename:

So here is my zgrep.py (example: ls *.gz | zgrep.py "a <.*> (.*)")

#!/usr/bin/env python
''' allow grep -H option of bzip & zip files '''

import sys
import gzip
import bz2
import re
 
exp = sys.argv[1]
expExtractor = re.compile(exp)

for filename in sys.stdin:
    filename = filename.strip()
    if filename.endswith('.gz'):
        freader = gzip.open(filename,'r')
    elif filename.endswith('.bz2'):
        freader = bz2.BZ2File(filename)
    else:
        freader = open(filename, 'r')
    for i, line in enumerate(freader):
        line = line.strip()
        if expExtractor.search(line):
            print "%s:%i <%s>" %(filename, i, line)

Thursday, May 16, 2013

The simplest python server example ;)

Today, I was looking for a simple python server template but couldn't find a good one so here is what I was looking for (yes the title is a little arrogant ;):

#!/usr/bin/env python

''' simple python server example;

    output format supported = html, raw or json '''
import sys
import json
from BaseHTTPServer import BaseHTTPRequestHandler, HTTPServer

FORMATS = ('html','json','raw')
format = FORMATS[0]

class Handler(BaseHTTPRequestHandler):

    #handle GET command
    def do_GET(self):
        if format == 'html':
            self.send_response(200)
            self.send_header("Content-type", "text/plain")
            self.send_header('Content-type','text-html')
            self.end_headers()
            self.wfile.write("body")
        elif format == 'json':
            self.request.sendall(json.dumps({'path':self.path}))
        else:
            self.request.sendall("%s\t%s" %('path', self.path))
        return

def run(port=8000):

    print('http server is starting...')

    #ip and port of server
    server_address = ('127.0.0.1', port)
    httpd = HTTPServer(server_address, Handler)

    print('http server is running...listening on port %s' %port)
    httpd.serve_forever()

if __name__ == '__main__':

    from optparse import OptionParser
    op = OptionParser(__doc__)

    op.add_option("-p", default=8000, type="int", dest="port",

                  help="port #")
    op.add_option("-f", default='json', dest="format",

                  help="format available %s" %str(FORMATS))
    op.add_option("--no_filter", default=True, action='store_false',

                  dest="filter", help="don't filter")

    opts, args = op.parse_args(sys.argv)

    format = opts.format
    run(opts.port)

Saturday, April 6, 2013

Finding the optimal K in kmean: a incremental kmeans in python

I was looking for an good implementation of an incremental k-means where I don't have to set the optimal K. There are interesting papers (x-means, gmeans etc.) but couldn't find any python implementation.

I have decided to write a incremental version on top of sklearn.
The idea is simple:

Start at K=x
identify worst cluster based on an unsupervised measure (ex: silhouette)
Split the worst cluster into 2 clusters
measure the global improvement with the new clusters
if you get an improvement continue adding clusters

You can find the source code in mlboost/clustering/ikmeans.py
A special thanks to scikit-learn lib to let me prototype this version so fast.

Friday, November 23, 2012

Real-time face recognition experiment packages

In Autumn 2009, I have been lecturer at ETS for a Machine Learning introduction class. In order to ensure the class could get a real feeling about machine learning, I have repackaged the digipy demo used for my presentation "Machine Learning Empowered by Python" for their final project. The latest code is here: https://bitbucket.org/fraka6/digiface
The digiface package was their recommended starting point. It is a real-time face recognition package so they could focus on extracting the best features, train easily a single neural net and experiment live or on the dataset picture.
The idea was simple, they will compete on the best real-time live face recognitions of the student faces themselves. Every student had to sit in front of each other face recognition system. The best system had to be quite robust in order to consider light, background et hair changes.
We had to make a pictures sessions and build the dataset.
One team built their own package called digijava. Here is a snapshot.

It was quite an interesting teaching experiment. I am glad to see that some of my student have followed my path and join Yoshua Bengio lisa great lab.

If you are looking for a great talk about the latest state of the art in machine learning, look at that Hinton "Brains, Sex, and Machine Learning" youtube video and Yoshua Bengio slides "DeepLearning of Representations"(Google talk).

http://www./google-MTL-22-11-2012.pdf

Monday, January 10, 2011

How to create standalone python apps?

You might have to run your applications in your customer infrastructure but you might not want to give your recipes (python source code) so here are the alternatives depending on your OS:

windows = py2exe
mac = py2app
linux = pyinstaller (freeze doesn't work->compile errors*)

On linux, pyinstaller works quite well but you have to generate it on the same distribution.

Here are the steps:

download latest version
python Configure.py
python Makespec.py /path/to/yourscript.py
python Build.py /path/to/yourscript.spec
start app: yourscript/dist/yourscript/yourscript(binary executable)

(*) Freeze instructions:

svn checkout http://svn.python.org/projects/python/trunk/Tools/freeze/
python freeze/freeze.py yourscript.py
make

Wednesday, October 13, 2010

simple multivariate classifier example using python & numpy

I was wondering how long it could take to write a multivariate classifier in python.

With python and numpy it isn't long. We simply need to be able to compute the covariance matrix, the determinant and to inverse a matrix (covariance matrix). Even if the matrix is singular, which mean it can't inverse it, you can compute the pseudo-inverse (Moore-Penrose) easily (i.e.: numpy.linalg.pinv).

As expected, assuming too much about the data lead to poor classification.

You can find a simple python program of 75 lines here.

Wednesday, July 21, 2010

patching class function in python

Today we had to patch a class function in production. Monkey patching can become tricky if reference are kept at several place like pointer in C and C++.
Here is a simple example on how to make sure all references will use the new definition.
class Foo:
  def f(self):
      print "default f"

def newf(self):
  print "newf"

Foo.f.im_func.func_code = newf.func_code
This is another reason why interpreter language like python are so powerful.

Friday, January 8, 2010

matplotlib & python for powerful data visualization

Here is an example of data that isn't obvious to analyze:
What is the gain and lost effect of percentage of seats in a point of view of proportional representation? Percentage of seats is usually chosen in legislative assemblies. It is the process used in Canadian and Québec elections.

Powerful visualization allow you to see easily the effect. Python & matplotlib is an amazing combination to do so. It took me 20 minutes to allow me to visualize the effect in federal and Quebec election of 2008.

Upper graph (seats vs votes) shows the lost of proportional vote % if you use a seats approach. As an example, liberals gain ~11% and ADQ lost of ~11%.
Lower graph (lost seats vs votes). The real impact of party is the ratio of this lost on their real vote proportion. In this example, it is a gain of ~25% for each Liberals votes (11/(66/125)) and a lost of 66% for the ADQ and ~88% for QS.

Basically:

In Canadian election, PC & BQ gain power but BQ way more in proportion and Greens lost everything
In Quebec election: QS & ADQ lost lot of power and PQ and LIB gain it: it might explain why they aren't talking of changing election formula
Matplot lib and python is an amazing combination to automate data visualization

to get the code do:

svn co https://mlboost.svn.sourceforge.net/svnroot/mlboost/elections
python elections/seats_vs_prop.py

Gerrymandering Explained (youtube;
Gerrymandering - another reason why rep democracy is fundamentally corrupt http://bit.ly/qO4mpH)