27 September 2016
16 August 2016
The Master's Degree in Information Technologies for Business Intelligence (IT4BI) received its first generation of students in September 2012. One year later, the second generation of which I was part, would be welcomed to start studies in Brussels as well.
Since its beginning, IT4BI has been facilitating learning, research and international collaboration in a positive and friendly atmosphere, thus providing the means for personal and professional development of many people around the World.
By the end of September 2016, the fifth generation of IT4BI students will have started their studies in Brussels while the third generation completes their thesis defences and the fourth generation begins their specialisation semesters in Barcelona, Berlin and Paris.
In September 2017, IT4BI will become BDMA, which stands for "Big Data Management and Analytics". BDMA focuses on the new needs of research, education, and industry with respect to Big Data and will keep on receiving the support of the European Commission as part of the Erasmus+ Programme.
IT4BI has literally changed many lives and there are no doubts that BDMA will keep on doing so. You may learn more about BDMA programme on the official website.
26 December 2015
Codeforces Submissions Dataset
I wanted to do some analysis on source code, and I needed a dataset where code snippets are labeled with the programming language they are in. I scraped this data from codeforces.com, which is a website for holding programming contests. In this post, I share this data.
tl;dr Scroll down to get the links.
Beyond any shadow of a doubt, a sufficient amount of correct, relevant, concise and up-to-date information is a key input in any decision-making process. This not only applies to profit-driven organisations but it is also relevant for the non-profit sector.
For instance, in a non-profit organisation, having access to membership information of good quality and in an efficient way is of utmost importance at the moment of defining membership strategies. Furthermore, good information is also crucial when it comes to translating strategies into tactics and, subsequently, turning the latter into action on the operational landscape.
17 December 2015
Test-Driven Machine Learning
The book “Test-Driven Machine Learning” by Justin Bozonier, published by Packt Publishing, is in print now. I was a technical reviewer of this book, and in this post you will learn some details about it. The book is available on the publisher’s website as well as on Safari Books Library.
19 October 2015
Data Science Interview Questions
Source: Data Science: An Introduction
Our IT4BI Master studies finished, and the next logical step after graduation is finding a job. I was interested in Data Science jobs and this post is a summary of my interview experience and preparation.
The term “Data Science” is not yet well establish, so interviews for Data Science jobs might include a very broad range of questions, depending on the interpretation of the term by a particular company. In this post I attempt to organize Data Science interview questions in some usable form, but it might also be biased by how I see Data Science myself. I hope you also can find it useful.
18 October 2015
Java Interview Questions
In the past, I was a Software Developer, and my primary programming language was Java. I also quite often interviewed people and also sometimes was an interviewee. In this post, I would like to share typical questions that you might expect at a job interview for a Java Developer position.
15 October 2015
The book "Mastering Data Analysis with R" by Gergely Daróczi, published by Packt Publishing, is in print now, and I had a pleasure to be a technical reviewer of this book.
If you're a Data Scientist who's looking to master R, this book is a good choice. It's already available on the publisher's website and on Safari books online.
11 June 2015
The French Institute for Research in Computer Science and Automation (INRIA) has awarded an ongoing IT4BI Master Thesis Project with the prize "Prix spécial du jury" in the context of the competition "Boost Your Code 2015".
The award was given for ElectioVis, an open source decision-aiding software tool that I started to develop as part of the IT4BI Decision Support and BI specialisation I am pursuing at École Centrale Paris. The project profits from the academic advisory of Prof. Valentina Ferretti, an expert in the decision-making field who lectures at École Centrale Paris and Polytechnic University of Turin.
ElectioVis is a website that aims to bring the power of decision-making closer to all citizens of the world, overcoming economic, social, cognitive and language barriers. It will be available online during June 2015 and everyone will be able to try it for free.
6 June 2015
The Four Fundamental Subspaces
This is a first blog post in the series “Fundamental Theorem of Linear Algebra”, where we are working through Gilbert Strang’s paper “The fundamental theorem of linear algebra” published by American Mathematical Monthly in 1993.
In this post, we will go through the first two parts of the Fundamental Theorem: the dimensionality and the orthogonality of the Fundamental Subspaces.
The Fundamental Theorem of Linear Algebra
This is a series of articles devoted to Gilbert Strang’s Paper “The fundamental theorem of linear algebra” published by American Mathematical Monthly in 1993.
29 April 2015
23 April 2015
Apache Mahout Samsara: The Quick Start
Samsara is a Linear Algebra library for Mahout. It’s written in Scala, which makes it possible to use operator overloading and it features nice R-like or Matlab-like syntax for basic Linear Algebra operations. For example, matrix multiplication is just
X %*% Y. What is more, these operations can be distributed and run by an executing environment - currently by Apache Spark.
In this article we will see how to quickly set up a basic skeleton project and then we’ll try to do some very simple analysis on a 200 MB dataset.
9 March 2015
In this blog post we are going to implement a Naive Bayes classifier in Apache Flink. We are going to use it for text classification by applying it to the 20 Newsgroup dataset. To understand what is going on, you should be familiar with Java and know what MapReduce is. If you have seen and understood a word count example in any system, you're good to go. If you haven't heard of MapReduce or haven't seen the word count, you may first have a look at our introductory post "Hadoop and MapReduce".
4 March 2015
In this article we will briefly discuss the computation paradigm MapReduce, and Apache Hadoop as one of its implementations. We won't get into much details, and we even won't implement the Word Count on Hadoop, but it should give some foundation for the future articles about tools for scalable data processing.
3 March 2015
"We will have more than a million clients and our company will be top leader in the industry over the next year". This is what every first time entrepreneur says at some point in time.
We often hear stories about young entrepreneurs who dropped school at a very young age and had a huge success. We look at these very few success stories and, as entrepreneurs, we lie to ourselves that one day we will be like them...
You normally recognise entrepreneurs as those who change jobs very frequently. They try a bit of everything and, in the end, they don’t get deep into any of the topics. They like to taste a bit of everything. They change countries, jobs and friends and it seems that, everywhere they land, they find something to do. They are proactive and extremely curious. They just don’t find their place in any of the traditional companies. They are dreamers and born sellers, even if they have to sell things not even they can imagine.
15 February 2015
The book "Spring Batch Essentials" by Packt Publishing is in print now, and I had a pleasure to be a technical reviewer of this book.
Spring Batch is a tool for creating ETL ("Read/Process/Write") jobs: for batch processing large portions of enterprise data that requires sophisticated transformations and involves complex business logic. It gives you a possibility to manage jobs easily, supports transactions and allows job execution to be scaled to process large volumes of data.
4 February 2015
Linear Algebra is a crucial prerequisite for many things, including Statistics, Data Mining, Machine Learning, Computer Vision, Image Processing and many many others, so it's very important to know the basics of Linear Algebra to understand more advanced concepts. For example, it's really helpful for our IT4BI studies, especially for the specialization at TU Berlin.
And the best time to learn Linear Algebra or refresh your knowledge about it is right now! At this moment there are a couple of nice MOOCs that have just started and a few more are about to start in the nearest future.
Even if you don't join right now, they should be available in the future for learning as self-paced versions. Additionally I would like to include my favorite video courses on Linear Algebra, they are also for learning at your own speed with no deadlines.
29 January 2015
28 January 2015
24 January 2015
Additionally, I sometimes get questions by email about the program and quite a lot of them are about the process: documents, motivation letters, etc. In this post I will also address these questions, and spare myself the troubles of typing the same text over and over again :)
15 January 2015
I started actively looking for programs that met these criteria and now I would like to share my experience. In this blog post I list the most interesting programs (to me) and I include only ones that I myself applied to. I will not describe the programs/scholarships in details, but instead will refer to links with information. However I will add some things that I think are important: e.g. the process of applying, interesting details, etc.
7 January 2015
The first semester of the second year is devoted to specialization, and there are 3 possible choices in the IT4BI program. One of them is "Distributed and Large-Scale Business Intelligence" and it is delivered by the DIMA group at the Technical University of Berlin. It is the specialization of my choice, so I'm happy to share my experience about it in this post.
31 December 2014
I have published not only the executable library but also the complete source code.
26 December 2014
The first year of the IT4BI program is devoted to Business Intelligence fundamentals. There is plenty of information on the website of the program, including the course content page and the course description document. Even though these resources are comprehensive enough, I would still like to add a few details about the curriculum, so you can compare what is written on the website with the things we studied.