Jesteś tutaj

Scalable Data Processing: Put The Complexity Safety Belt On

MapReduce is a parallel distributed programming model of processing large datasets in computing clusters. It is widely used in practice and has a number of open-source more-or-less mature implementations. Although numerous parallel programming models have been proposed so far, they are mostly theoretical constructs or have severe limitations that push their applications to niches (e.g. GPU). Yet MapReduce is the first model that is (1) practically implementable and (2) almost limitless. The notion of complexity of actual MapReduce algorithms is significantly more difficult to define, measure and research than the complexity of classic algorithms. In the literature, there are partial accounts on this complexity, e.g. the communication complexity measured as the number of tuples resulting from the Map phase. In this talk I will discuss the recently (2013) proposed notion of minimal MapReduce algorithms that is probably the first complete answer to the question of MapReduce complexity. It takes care of all the aspects: the complexity of Map and Reduce functions and the communications complexity. Furthermore, it allows relating the real complexity of a MapReduce algorithm to the corresponding sequential algorithm. Another interesting Google-originated programming paradigm called Pregel is concerned with processing of large graphs. It has similar key attributes as MapReduce: it is parallel, distributed, scalable and, what is most important, implementable. I will also talk on even most recent (2014) so called practical Pregel algorithms (PPA) that are the first holistic approach to the complexity of Pregel computation. PPAs are Pregel counterparts of minimal MapReduce algorithms.
Prelegent (spoza ISI): 
Krzysztof J. Stencel, prof UW, MIMUW
Rodzaj seminarium: 
czwartek, Czerwiec 26, 2014 - 14:15