Saturday, April 4, 2015

How to create your ipython datascience notebook server fast?

ipython [notebook] is a powerful tool to share notes.

Here is how to start an ipython private notebook server:

  1. Install anaconda python distribution
  2. ipython notebook --pylab inline (works directly with anaconda python distribution)
  3. localhost:9000 (*open your favorite browser)

*If need to run it on a restricted server: set port forwarding localhost:9000 -> 8888 (so you can use a normal browser)

Here is how to start a public ipython notebook on the grid:
  • ipython profile create nbserver
  • cd /home/$USER/.ipython/profile_nbserver/
  • gen password: ipython;from IPython.lib import passwd; passwd()
  • openssl req -x509 -nodes -days 365 -newkey rsa:1024 -keyout mycert.pem -out mycert.pem
  • edit ~/.ipython/profile_nbserver/
    • c = get_config()
    • c.NotebookApp.password = u'sha1:XXXXXXXXXXXXXX'
    • c.IPKernelApp.pylab = 'inline'  # if you want plotting support always
    • Notebook config
    • c.NotebookApp.certfile = u'mycert.pem'
    • c.NotebookApp.ip = '*'
    • c.NotebookApp.open_browser = False
    • # It is a good idea to put it on a known, fixed port
    • c.NotebookApp.port = 8888
  • ipython notebook --profile=nbserver

How can you share notebooks?

  1. Create a public notebook (B)
  2. Leverate with github. Your ipynb files will be rendered on if you commit on github like this:
  3. use (Sense is a collaborative platform to accelerate data science from exploration to production.)
Résultats de recherche d'images pour « »


How to deploy a python datascience python app on Heroku? numpy+scipy+pandas+sklearn+matplotlib

Résultats de recherche d'images pour « datascience »Résultats de recherche d'images pour « heroku » +Résultats de recherche d'images pour « python »

If you are looking for a way to deploy a datascience python app on heroku, you might have some troubles like:

  • time out
  • numpy and scipy incompatibilities
After several interation, here is a script to do it:

Here is what you will get:

Btw, I am currently experimenting my datascience stuff on sense,io ( is a collaborative platform to accelerate data science from exploration to production.)
Résultats de recherche d'images pour « »

Tuesday, November 25, 2014

NIPS 2014 - highlights

This year, NIPS is in my home town Montréal!
Here is the highlights

Most interesting invited talks:
University of California, Riverside

Posner Lectures

University of Pennsylvania
Princeton University

Friday Dec 12 highlights workshops

more details:

Workshop Friday 12 dec : 8:30am – 6:30pm Friday, December 12, 2014

8:30am – 6:30pm
8:30am – 6:30pm
8:30am – 6:30pm
 Workshop Saturday 13 dec : 8:30am – 6:30pm Friday, December 12, 2014

8:30am – 6:30pmShakir MohamedTamara BroderickCharles BlundellMatthew D Hoffman,David BleiMichael I Jordan
Advances in Variational Inference
8:30am – 6:30pmShivani AgarwalHossein Azari SoufianiGuy BreslerSewoong OhDavid C ParkesArun RajkumarDevavrat Shah
Analysis of Rank Data: Confluence of Social Choice, Operations Research, and Machine Learning

8:30am – 6:30pmRichard BaraniukMichael C MozerDivyanshu VatsChristoph StuderAndrew E WatersAndrew Lan
Human Propelled Machine Learning

8:30am – 6:30pmDavid S ChoiAaron ClausetEdo M AiroldiLeto PeelJohan Ugander
Networks: From Graphs to Rich Data

Conference other interesting talks:

Tuesday Dec 9th

Sunday, November 16, 2014

Agile tour 2014 highlights/notes

I went to my first "agile tour" conference. Why: I want to hear Jean-Marc De Jonghe (transform management to succeed an agile transformation) and David Hussman (Lessons in gravity: what keeps you down?).

Here are my highlights notes (yes its notes):
Jean-Marc De Donghe:

  • Wall was coming
    • slow decline of subscribers (~45 mins)
    • increase web (5-15 mins) -> less revenues
  • A good hockey player plays where the puck is. A great hockey player plays where the puck is going to be. - Wayne Gretzky
  • smarthphone web traffic: 86% apps, 14% browser
  • experience TV->laptop->tablette->cell (intimité/privacy)
  • Agile vs old school
    • context vs control
    • confidence vs fear
    • context and vision
    • recruiting=attitude, aptitude, diversité
    • agilité bidirectionnel (good luck)
    • build trust = quick win greedy approach
    • protect your team
    • Speed = disorder (tech debt); 
    • Environment who  promote agile
    • No split by job
    • refactoring = speed 
    • Estimate added value to set priorities
David Hussman:
  • gravity sucks!
  • launch rocket -> Egyptian approach = much easier 
  • uncertain/overly certain Product(uncertain) /techno (certain)
  • y = product learning, x = structure complexity
  • Accidental complexity-> incidental complexity 
  • mass has consequence, process mass, techno mass, mass of certainty, meeting mass
  • + nonban (least process+most measurable value)
  • better conversation not stories
  • diff between what they need and what you think they need
  • gource mine craft (visualize code)
  • point = size of effort (useless) size of value
  • book: anti fragile (the black swan) + founder at work
  • investment discovery vs delivery
  • valuable vs feasible

Others: (great speakers -> Michel Céré, Richard Martin)
  • value not effort
  • Self manage team = team -> scrum master -> boss
  • no hierarchy + scrum master = authority 
  • People value more the process then the result
  • influence silencieuse de l'autorité (manager keep forgetting)
  • scrum master should do their work
  • propose: auto organize or hierarchy request (its a choice)
  • team should force people to take off is required
  • culture  = fondement ordre social
  • valeurs vécus+artefacts+presuposé (macro culture, sous culture, micro culture)
  • Changement culturel & objectif d'affaire
  • Culture des opérations (collaboration/communication), culture conception/ingénierie (idea, système élégeant), culture haute direction (finance, image entreprise et de soi)
  • Importance of rituals
  • voir openagileadoption
  • Dynamisme/cohérence globale -> reponse pertinente
  • agile: empowering groups+cadre execution aligné+ cible=client
  • WIP/JIT/Velocity (work in progress, just in time...)
  • TRG = taux de rendement global: toyota = 86% + bureau = 17%
  • Leadership mobiliser
  • QIX = Qty I.... Max
  • Volonté stratégique

Thursday, November 13, 2014

balance between writting tools and research

Do do research, we need tools but need to find the right balance between writing the tool we need and doing the research. Context switching between the 2 isn't instantaneous and I suspect its because it use very different skill-sets and brain region. 
According to me, several reason lead researcher to reinvent the wheel:
  • easier + faster at start + no maintenance required vs reuse = harder at first but more productive in the long run (trade-off fast/reusable/maintenance)
  • Not invented here culture
  • Skill sets distance (distance between coding (programmer) and research is less important than leveraging code (engineering) and research)

In order to get a better balance between writing tools and doing research, I would like to better leverage existing tools to not reinvent the wheel like:
  • algebird -> twitter/algebird · GitHub (Abstract Algebra for Scala running on hadoop)
  • Vertica (HP) -> Analytics platform (fast and easy data views/pivots)
  • Apache Drill(Self-service data exploration on hadoop)
  • high-performance, easy-to-use data structures and data analysis tools for the Python programming language.

1) Algebird is an abstract algebra library for Scala developed at Twitter and released under the ASL 2.0 license and run on hadoop. It has support for algebraic structures such as semigroups, monoids, groups, rings and fields as well as the standard functional things like monads. More interestingly though are the probabilistic data structures and the accompanying monoids that come out of the box. (Big Data )

Big Data Analytics—No Limits, No Compromises

  • Live Aggregate projections
  • Open Architecture
  • Blazing – Fast Analytics
  • Massive Scalability

3) Apache Drill:  Apache Drill is an open source, low latency SQL query engine for Hadoop and NoSQL.

 Learn how to use Prediction APIs and make Machine Learning work for you — without hiring an expert.

Saturday, October 11, 2014

faster learning/reading power by get

In the age of information, time is becoming our biggest asset. How to learn faster?

First alternative is the read faster...

But have you though of reading more concentrated content?
I have discovered, what an amazing source of concentrated information.

Here are 2 great books on crutial conversation and confrontation that might help you not derail in high stakes situation that might change your life.
Crucial ConfrontationsCrucial Conversations

Don't forget to apply the super memo model (review concept to assimilate them)