Про бигдату

Вот искал на днях что-то про бенчмарки всяких там ARM-ов, и наткнулся на пару публикаций, где на наглядном примере демонстрируется, что такое – big data, а что – хуйня из-под коня.

Раз (https://inspirehep.net/literature/1424617):

Projects such as the Large Hadron Collider at CERN generate enormous amounts of raw data which presents a serious computing challenge. After planned upgrades in 2022, the data output from the ATLAS Tile Calorimeter will increase by 200 times to over 40 Tb/s.

Два (http://crm-en.ics.org.ru/uploads/crmissues/crm_2015_3/15728.pdf):

The term ”Big Data” has caught on in the mainstream media and science worlds. While this word in now ubiquitous and almost exhausted in it’s use it still identifies an important issue in the science community. Processing data is getting more difficult due to the shear amount being produced. In the year 2022 the ATLAS detector will be upgraded and in doing so will produce in the order of Petabytes per second of raw data [ATLAS C 2012 Letter of Intent..., 2012]. There is no feasible way to process this much data in a reasonable amount of time. This is largely due to external Input/Output (I/O) bottlenecks present in current super computing systems. A team at the University of the Witwatersrand, Johannesburg is actively involved in the development of a computing system which is both cost-effective and able to provide high data throughputs in the order of Gigabits per second. There are four widely accepted computing paradigms. The first, and most commonly known, is the High Performance Computing paradigm (HPC) which is focused on the raw number of calculations performed per second. The second is the Many Task Computing (MTC) which focusses on the number of jobs that can be completed in a given amount of time. Real Time Computing (RTC) involves very strict restrictions on execution times (such as air-bag sensors or process controls). Finally, a fourth paradigm called Data Stream Computing (DSC) involves the processing of large amounts of data with no off-line storage.

Короче говоря, если у вас поток обрабатываемых данных меньше 40 Тб/с, или на одной ноде вы не можете обрабатывать хотя бы несколько Гбит/с – отойдите в сторонку и не мешайте.

Комментарии отключены.