Thursday, November 13, 2014

balance between writting tools and research

Do do research, we need tools but need to find the right balance between writing the tool we need and doing the research. Context switching between the 2 isn't instantaneous and I suspect its because it use very different skill-sets and brain region.

According to me, several reason lead researcher to reinvent the wheel:

easier + faster at start + no maintenance required vs reuse = harder at first but more productive in the long run (trade-off fast/reusable/maintenance)
Not invented here culture
Skill sets distance (distance between coding (programmer) and research is less important than leveraging code (engineering) and research)

In order to get a better balance between writing tools and doing research, I would like to better leverage existing tools to not reinvent the wheel like:

algebird -> twitter/algebird · GitHub (Abstract Algebra for Scala running on hadoop)
Vertica (HP) -> Analytics platform (fast and easy data views/pivots)
Apache Drill(Self-service data exploration on hadoop)
http://pandas.pydata.org/ high-performance, easy-to-use data structures and data analysis tools for the Python programming language.

1) Algebird is an abstract algebra library for Scala developed at Twitter and released under the ASL 2.0 license and run on hadoop. It has support for algebraic structures such as semigroups, monoids, groups, rings and fields as well as the standard functional things like monads. More interestingly though are the probabilistic data structures and the accompanying monoids that come out of the box. (Big Data )

2) HP Vertica Analytics platform

Big Data Analytics—No Limits, No Compromises

Live Aggregate projections
Open Architecture
Blazing – Fast Analytics
Massive Scalability

3) Apache Drill: Apache Drill is an open source, low latency SQL query engine for Hadoop and NoSQL.

Apache Drill - Self-Service Data Exploration

-------------------------------

MAKE YOUR APPS AND YOUR BUSINESS SMARTER

Learn how to use Prediction APIs and make Machine Learning work for you — without hiring an expert.

http://www.louisdorard.com/machine-learning-book/

Fraka6 Blog - No Free Lunch

Thursday, November 13, 2014

balance between writting tools and research

Big Data Analytics—No Limits, No Compromises

No comments:

Post a Comment

Blog Archive