9 March 2015

Naive Bayes on Apache Flink

In this blog post we are going to implement a Naive Bayes classifier in Apache Flink. We are going to use it for text classification by applying it to the 20 Newsgroup dataset. To understand what is going on, you should be familiar with Java and know what MapReduce is. If you have seen and understood a word count example in any system, you're good to go. If you haven't heard of MapReduce or haven't seen the word count, you may first have a look at our introductory post "Hadoop and MapReduce".

4 March 2015

Hadoop and MapReduce


In this article we will briefly discuss the computation paradigm MapReduce, and Apache Hadoop as one of its implementations. We won't get into much details, and we even won't implement the Word Count on Hadoop, but it should give some foundation for the future articles about tools for scalable data processing.

3 March 2015

The Dark Side of Entrepreneurship


"We will have more than a million clients and our company will be top leader in the industry over the next year". This is what every first time entrepreneur says at some point in time.

We often hear stories about young entrepreneurs who dropped school at a very young age and had a huge success. We look at these very few success stories and, as entrepreneurs, we lie to ourselves that one day we will be like them...

You normally recognise entrepreneurs as those who change jobs very frequently. They try a bit of everything and, in the end, they don’t get deep into any of the topics. They like to taste a bit of everything. They change countries, jobs and friends and it seems that, everywhere they land, they find something to do. They are proactive and extremely curious. They just don’t find their place in any of the traditional companies. They are dreamers and born sellers, even if they have to sell things not even they can imagine.