Some thoughts of a Machine Learning Practitioner on Software Development, Management, Team Building, Startups, Python, Agile Development, Data visualization... that will distract you from your end goals by making you less efficient but are critical to manage in order to succeed.
Don't forget that long time adaptation to inefficient approaches can become your enemy. Let's try to empower others by sharing knowledge & personal experiences.
Thursday, November 13, 2014
balance between writting tools and research
Do do research, we need tools but need to find the right balance between writing the tool we need and doing the research. Context switching between the 2 isn't instantaneous and I suspect its because it use very different skill-sets and brain region.
According to me, several reason lead researcher to reinvent the wheel:
easier + faster at start + no maintenance required vs reuse = harder at first but more productive in the long run (trade-off fast/reusable/maintenance)
Not invented here culture
Skill sets distance (distance between coding (programmer) and research is less important than leveraging code (engineering) and research)
In order to get a better balance between writing tools and doing research, I would like to better leverage existing tools to not reinvent the wheel like:
Vertica (HP) -> Analytics platform (fast and easy data views/pivots)
Apache Drill(Self-service data exploration on hadoop)
http://pandas.pydata.org/ high-performance, easy-to-use data structures and data analysis tools for the Python programming language.
1) Algebird is an abstract algebra library for Scala developed at Twitter and released under the ASL 2.0 license and run on hadoop. It has support for algebraic structures such as semigroups, monoids, groups, rings and fields as well as the standard functional things like monads. More interestingly though are the probabilistic data structures and the accompanying monoids that come out of the box. (Big Data )