IT Shared: Apache Spark

27 September 2016

A Beginner's Guide to Apache Flink – 12 Key Terms, Explained

Overview

In this post, I will go through 12 core Apache Flink concepts to better understand what it does and how it works. This article could perfectly serve as a beginner's overview of Flink and Streaming engine terminology.

1. What is Apache Flink?

At first glance, the origins of Apache Flink can be traced back to June 2008 as a researching project of the Database Systems and Information Management (DIMA) Group at the Technische Universität (TU) Berlin in Germany.

Apache Flink is an open source platform for distributed stream and batch data processing, initially it was designed as an alternative to MapReduce and the Hadoop Distributed File System (HFDS) in Hadoop origins.

According to the Apache Flink project, it is an open source platform for distributed stream and batch data processing. Flink’s core is a streaming dataflow engine that provides data distribution, communication, and fault tolerance for distributed computations over data streams. Flink also builds batch processing on top of the streaming engine, overlaying native iteration support, managed memory, and program optimization.”

23 April 2015

Apache Mahout Samsara: The Quick Start

Last week the newest Apache Mahout 0.10 was released. One of the new features it has is a new math environment called “Samsara”, or Mahout Scala/Spark Bindings.

Samsara is a Linear Algebra library for Mahout. It’s written in Scala, which makes it possible to use operator overloading and it features nice R-like or Matlab-like syntax for basic Linear Algebra operations. For example, matrix multiplication is just X %*% Y. What is more, these operations can be distributed and run by an executing environment - currently by Apache Spark.

In this article we will see how to quickly set up a basic skeleton project and then we’ll try to do some very simple analysis on a 200 MB dataset.

	Ahmet Anıl Pala
	Alexey Grigorev
	Andrés Vivanco Villamar
	Andres Felipe Zamora Montaño
	Elena Samota
	Guven Toprakkiran
	Hicham Akaoka Badssi
	José Luis Pino López
	Madalina Burghelea
	Maximiliano Ariel López
	Mia Johnson Vioulès
	Navid Mahlouji
	Nyami Ronald Mitterand
	Steffi Melinda
	Stephany García Martínez
	Tamara Mendt