Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

8.
What is a row store• Youve probably used one – MyISAM– InnoDB– Memory– TokuDB– Etc8

9.
What is a row store• All MySQL storage engines are row stores– The storage engine API is row oriented– This means the API always stores and retrievesentire rows from disk9

10.
Materialization• Row stores live "in a material world"– A row is a collection of column values (attributes)– A row store stores all the values together on disk• That is, rows are materialized on disk10

15.
Worst case scenario• select sum(bigint_column) from table– 1000000 rows in table– Average row length 1024 bytes• The select reads one bigint column (8 bytes)– But the entire row must be read to get one column– Reads ~1GB of table data for ~8MB of column data15

17.
Non-material world• In a column stores rows are not materializedduring storage– An entire row image exists only in memory– Each row still has some sort of "row id"17

18.
What is a row?• A row is a collection of column values that areassociated with one another• Associated?– Every row has some type of "row id"• PK column or GEN_CLUST_INDEX on InnoDB• Physical offset in a MyISAM table• In some databases it is just a virtual identifier for a row18

21.
Column store benefit: Compression• Effective compression reduces storage cost• IO reduction yields decreased response timesduring queries as well– Queries may execute an order of magnitude fastercompared to queries over the same data set on arow store• 10:1 to 30:1 compression rations may be seen21

22.
Best case scenario (IO savings)• select sum(bigint_column) from table– 1000000 rows in table– Average row length 1024 bytes• The select reads one bigint column (8 bytes)– Only the single column is read from disk– Reads ~8MB of column data instead of 1GB oftable data– Could read even less with compression!22

23.
Less than ideal scenario*select *from long_wide_tablewhere order_line_id = 321837291;23All columnsSingle row* Possibly worst case. Well see in the performance test as this varies by column store.

24.
Column store downsides:• Accessing all columns in a table doesnt saveanything– It could even be more expensive than a row store– Not ideal for tables with few columns24

25.
More downsides• Updating and deleting rows is expensive– Some column stores are append only– Others strongly discourage writes– Some column stores split storage into row andcolumn areas (in Vertica* this is called WOS/ROS)25*Not open source, based on C-Store an MIT project

27.
Analytics versus OLTP• Row stores are for OLTP– Reading small portions of a table, but often manyof the columns– Frequent changes to data– Small* amount of data (typically working set mustfit in ram)– "Nested loops" joins are well optimized for OLTP27* < 2TB data sets in general

28.
Analytics versus OLTP• Column stores are designed for OLAP– Read large portions of a table in terms of rows, butoften a small number of columns– Batch loading / updates– Big data*!• Effective compression and reduced query IO makecolumn stores much more attractive for structured bigdata• Machine generated data is especially well suited28* 50TB – 100TB per machine is typically possible, before compression

29.
Not all column stores are for big data!• In-memory analytics is popular– Some column stores are designed to be used as anin-memory store– It is possible to build very large SMP clusters whichscale these databases to many machines but thisoutside of the scope of this talk29

30.
Indexing• Row stores mainly use tree style indexes– When you create an index on a row store youusually create a "B" tree index– These indexes leverage binary search– Very fast as long as the index fits in memory30

31.
Tree indexes• A read is necessary to satisfy a write– Even with fast IO this is expensive– This means that tree indexes work best for in-memory data sets– Very large data sets usually end up with indexeswhich are unmanageably large31

32.
Use less trees: Bitmap indexing• Some column stores support bitmap indexes– Columnar storage lends very well to bitmaps– Bitmap indexes are effective for Boolean searchesover multiple columns in a table32

33.
Use less trees: Bitmap indexing• Unfortunately– Bitmap indexes are not supported by any MySQLcolumn store– Very expensive to update33

38.
MySQL column stores – InfiniDB Community• InfiniDB community edition– Some data type limitations– Single threaded query*– Single threaded loader*– No indexes– Wrong results in testing– Hash join only38* You are required to bind infinidb to only a single core

41.
MonetDB• Early (started circa 2004) column store• Open source• Designed for in-memory working sets– Indexes must fit in memory– Swap size must be large enough for working set ifworking set exceeds memory• Some wrong results during testing41