Differences between revisions 5 and 6
Revision 5 as of 2014-10-01 07:52:46
Size: 4694
Comment:
Revision 6 as of 2014-11-11 17:52:33
Size: 4924
Comment: Updated module / feature matrix - elasticsearch features improved - added cassandra
Deletions are marked like this. Additions are marked like this.
Line 27: Line 27:
|| MetaModel-elasticsearch || paged || native ||<#FFFF00> client-side* || native ||<#FF0000> greedy ||
Line 28: Line 29:
|| MetaModel-cassandra || paged || native ||<#FFFF00> client-side* ||<#FF0000> greedy ||<#FF0000> greedy ||
Line 30: Line 32:
|| MetaModel-elasticsearch || paged || native ||<#FFFF00> client-side* ||<#FF0000> greedy* ||<#FF0000> greedy ||

This page describes the various strategies for executing queries in MetaModel.

Native vs greedy execution

Of particular interest is to specify in which cases MetaModel can delegate (aka "push down") query execution to a native query engine vs. having to execute the query in memory (often a greedy approach - Java code supplied by MetaModel).

The following table documents the execution capability in specific modules of MetaModel. Each column represents a query type. The query types are:

  • Plain FROM: Simple queries of the form 'SELECT y FROM x'. Possible values:
    • streaming: The dataset is implemented in a truly streaming fashion.
    • paged: The dataset fetches pages/bulks of records.
    • in-memory: The dataset has to consume ALL records into memory. This is ineffecient and may cause out of memory issues.
  • Simple COUNT: Queries of the form 'SELECT COUNT(*) FROM x'. Possible values:
    • native: The module supports a effective native method of getting the count. Some modules also support additional criteria on COUNT queries, e.g. 'SELECT COUNT(*) FROM x WHERE z' which is marked as 'native (incl. WHERE)'.
    • greedy: The module has to run through the dataset to do the counting. This is ineffecient but usually has little memory impact.
  • Simple WHERE: Are simple WHERE items being delegated natively, or are they evaluated client-side for each record?
  • Primary key lookup: Queries that look up records by their primary keys: 'SELECT y FROM x WHERE x.id = 42.
  • Groups and aggregates: Are GROUP BY and aggregation functions being delegated natively, or are they calculated in memory?

Plain FROM

Simple COUNT

Simple WHERE

Primary key lookup

Groups and aggregates

MetaModel-csv

streaming

greedy when exact
native when approximated

client-side

no PK

greedy

MetaModel-jdbc

streaming

native (incl. all variants)

native

native

native

MetaModel-excel

streaming .xlsx
in-memory .xls

native

client-side

no PK

greedy

MetaModel-pojo

in-memory

native

client-side

no PK

greedy

MetaModel-couchdb

streaming

native

native

native

greedy

MetaModel-mongodb

streaming

native (incl. WHERE)

native

native

greedy

MetaModel-elasticsearch

paged

native

client-side*

native

greedy

MetaModel-hbase

streaming

native

client-side*

native

greedy

MetaModel-cassandra

paged

native

client-side*

greedy

greedy

MetaModel-json

streaming

greedy

client-side

no PK

greedy

MetaModel-xml

streaming SAX
in-memory DOM

greedy

client-side

greedy

greedy

MetaModel-salesforce

paged

native (incl. WHERE)

native

native

greedy*

MetaModel-sugarcrm

paged

native

native

greedy

greedy

* = improvement is possible (even within the scope of MetaModel)

QueryExecutionStrategies (last edited 2014-11-11 17:52:33 by KasperSorensen)