Thursday, November 13, 2014

balance between writting tools and research

Do do research, we need tools but need to find the right balance between writing the tool we need and doing the research. Context switching between the 2 isn't instantaneous and I suspect its because it use very different skill-sets and brain region. 
According to me, several reason lead researcher to reinvent the wheel:
  • easier + faster at start + no maintenance required vs reuse = harder at first but more productive in the long run (trade-off fast/reusable/maintenance)
  • Not invented here culture
  • Skill sets distance (distance between coding (programmer) and research is less important than leveraging code (engineering) and research)

In order to get a better balance between writing tools and doing research, I would like to better leverage existing tools to not reinvent the wheel like:
  • algebird -> twitter/algebird · GitHub (Abstract Algebra for Scala running on hadoop)
  • Vertica (HP) -> Analytics platform (fast and easy data views/pivots)
  • Apache Drill(Self-service data exploration on hadoop)
  • high-performance, easy-to-use data structures and data analysis tools for the Python programming language.

1) Algebird is an abstract algebra library for Scala developed at Twitter and released under the ASL 2.0 license and run on hadoop. It has support for algebraic structures such as semigroups, monoids, groups, rings and fields as well as the standard functional things like monads. More interestingly though are the probabilistic data structures and the accompanying monoids that come out of the box. (Big Data )

Big Data Analytics—No Limits, No Compromises

  • Live Aggregate projections
  • Open Architecture
  • Blazing – Fast Analytics
  • Massive Scalability

3) Apache Drill:  Apache Drill is an open source, low latency SQL query engine for Hadoop and NoSQL.

 Learn how to use Prediction APIs and make Machine Learning work for you — without hiring an expert.

No comments:

Post a Comment