A schema defines a multi-dimensional database. It contains a logical model, consisting of cubes, measures, dimensions and attributes, and a mapping of this model onto a physical model.
The logical model consists of the constructs used to write queries in MDX language: cubes, measures, dimensions, attributes, hierarchies, levels, and members.
The physical model is the source of the data which is presented through the logical model. It is a set of tables and other relational expressions, connected using links. Typically the tables are stored in a relational database and arranged in a star or snowflake schema; later, we shall see examples of other kinds of mappings.
Mondrian schemas are represented in an XML file. An example
schema, containing almost all of the constructs we discuss here, is
supplied as demo/FoodMart.mondrian.xml
in the Mondrian
distribution. The dataset to populate this
schema can be downloaded
from Pentaho.
Currently, the only way to create a schema is to edit a schema XML file in a text editor. The XML syntax is not too complicated, so this is not as difficult as it sounds, particularly if you use the FoodMart schema as a guiding example.
The structure of the XML document is as follows:
<Schema>
<Table>
<Key>
<Column>
<Query>
<SQL>
<Key> (as for Table)
<Rows>
<Row>
<Value>
<Link/>
<Dimension/> (shared; as for Dimension within Cube)
<Cube>
<Key>
<Column/>
<Name/> (as Key)
<Caption/> (as Key)
<OrderBy/> (as Key)
<Closure/>
<Script/>
<Property>
<Script/>
<Levels>
<Level/>
<Measures>
<Measure/>
<MeasureRef/>
<FactLink/>
<CopyLink/>
<NoLink/>
<Formula/>
<Script/>
<NamedSet>
<Formula/>
<Role>
<MemberGrant/>
<Union>
<RoleUsage/>
<Script/>
<Parameter/>
The order of XML elements is not important. For example, as long
as it occurs as a child of a <Schema>
element, a
<UserDefinedFunction>
element might occur after one
<Cube>
element and after another.
This is different to Mondrian version 3.x, where order of elements was very important. For other changes with previous versions of the schema, see the release notes.
The content of each XML element is described in Appendix A and in the XML schema.
The <Schema>
element
is the root element of every Mondrian schema. For example:
<Schema name="Rock Sales" metamodelVersion="4.0">
A schema has a name
attribute and, optionally,
a description
. Other attributes are described in
the XML schema, which you can reach via
the element description
the Appendix A. (We shall not
describe every attribute of in this guide, just the most important
ones, so get into the habit of clicking on that link!)
The metamodelVersion
attribute allows Mondrian to tell which
version the schema was intended for; if the version is different than
the current version of the software, Mondrian may be able to convert
it automatically. The current version is "4.0". If the
attribute is missing (it was optional before Mondrian version 4)
Mondrian does its best to deduce the version of the schema based on
its contents.
The major element types (schema, cube, shared dimension, dimension, attribute, hierarchy, level, measure group, measure, calculated member) support annotations. An annotation is a way of associating a user-defined property with a metadata element, and in particular, allows tools to add metadata without extending the official Mondrian schema.
Create an <Annotations>
element as a child of the element
you wish to annotate, then include a number of <Annotation>
elements. <Annotation>
elements' names must be unique within
their element. If you are adding annotations to support a particular
tool that you maintain, choose annotation names carefully, to ensure
that they do not clash with annotations used by other tools.
The following example shows "Author" and "Date"
annotations attached to a <Schema>
object.
<Schema name="Rock Sales" metamodelVersion="4.0">
<Annotation name="Author">Fred Flintstone</Annotation>
<Annotation name="Date">10,000 BC</Annotation>
</Annotations>
<Cube name="Sales">
...
Some annotation names are used by convention across several tools. They are as follows:
Annotation | Element(s) | Description |
---|---|---|
AnalyzerBusinessGroup | Level | Used to create folders in the UI |
AnalyzerBusinessGroupDescription | Level | Description for the folders |
AnalyzerDateFormat | Level | Used for relative date filters |
AnalyzerHideInUI | Measure, CalculatedMember | Hides the field in the UI |
AnalyzerDisableDrillLinks | Cube | Disables drillthrough links on the cube |
The most important components of a schema are cubes, measures, attributes and dimensions:
Let's look at the XML definition of a simple schema.
<Schema name="Sales" metamodelVersion="4.0">
<Table name="sales_fact_1997"/>
<Table name="customer"/>
<Table name="time_by_day"/>
<Cube name="Sales">
<Dimension name="Customer" table="customer" key="Id">
<Attribute name="Gender" column="gender"/>
<Attribute name="Id" column="customer_id"/>
</Attributes>
</Dimension>
<Dimension name="Time" table="time_by_day" key="Day">
<Attribute name="Year" column="the_year"/>
<Attribute name="Quarter" column="quarter">
<Key>
<Column name="the_year"/>
<Column name="quarter"/>
</Key>
</Attribute>
<Attribute name="Month" column="month_of_year">
<Key>
<Column name="the_year"/>
<Column name="month_of_year"/>
</Key>
</Attribute>
<Attribute name="Day" column="time_id"/>
</Attributes>
<Hierarchy name="Yearly" hasAll="false">
<Level attribute="Year"/>
<Level attribute="Quarter"/>
<Level attribute="Month"/>
</Hierarchy>
</Hierarchies>
</Dimension>
</Dimensions>
<MeasureGroup name="Sales" table="sales_fact_1997">
<Measures>
<Measure name="Unit Sales" column="unit_sales" aggregator="sum" formatString="#,###"/>
<Measure name="Store Sales" column="store_sales" aggregator="sum" formatString="#,###.##"/>
<Measure name="Store Cost" column="store_cost" aggregator="sum" formatString="#,###.00"/>
</Measures>
<ForeignKeyLink dimension="Customer" foreignKeyColumn="customer_id"/>
<ForeignKeyLink dimension="Time" foreignKeyColumn="time_id"/>
</MeasureGroup>
<CalculatedMember name="Profit" dimension="Measures" formula="[Measures].[Store Sales] - [Measures].[Store Cost]">
<CalculatedMemberProperty name="FORMAT_STRING" value="$#,##0.00"/>
</Cube>
</Schema>
This schema contains a single cube, called "Sales". The Sales cube has two dimensions, "Time", and "Gender", and four measures, "Unit Sales", "Store Sales", "Store Cost", and "Profit".
We can write an MDX query on this schema:
SELECT {[Measures].[Unit Sales], [Measures].[Store Sales]} ON COLUMNS,
{Descendants([Time].[Yearly].[1997].[Q1])} ON ROWS
FROM [Sales]
WHERE [Customer].[Gender].[F]
This query refers to the Sales cube ([Sales]
), the
dimensions [Measures]
, [Time]
,
[Customer]
, the hierarchy [Time].[Yearly]
,
the attribute [Customer].[Gender]
, and various members of
those dimensions. The results are as follows:
[Time]
[Measures].[Unit Sales]
[Measures].[Store Sales]
[1997].[Q1]
32,910 $69,798.23 [1997].[Q1].[Jan]
10,932 $23,309.04 [1997].[Q1].[Feb]
10,266 $21,773.93 [1997].[Q1].[Mar]
11,712 $24,715.26
Now let's look at the schema definition in more detail.
A cube (see <Cube>
) is a named collection of dimensions
and measures.
The dimensions are children of a <Dimensions>
holder
element. Even though schema order does not matter, typically the
dimensions are declared first. Then follow the measures, organized
into measure groups and under a <MeasureGroups>
holder
element. (A measure group is a collection of measures that have the
same fact table. The simple cubes we shall be considering at first
have just one measure group; later, we shall see examples of cubes
with more than one measure group.)
The [Sales]
cube in the previous example has two
dimensions. The [Customer]
dimension has
attributes [Gender]
and [Id]
; and
the [Time]
dimension has
attributes [Year]
, [Quarter]
, [Month]
and [Day]
.
The [Sales]
cube's measure group is based on the
table "sales_fact_1997"
. As we shall see later, each
table used in the logical schema must be declared within the physical
schema, and sure enough, the
<PhysicalSchema>
element has a child element
<Table name="sales_fact_1997">
.
The fact table contains the columns from which measures are
calculated, and foreign key columns that link to dimension tables. To
use these columns, Mondrian needs to know about them, so all of these
columns occur somewhere in the
<MeasureGroup>
element. The unit_sales
, store_sales
and store_cost
columns each occur within a
<Measure>
definition; and the
customer_id
and time_id
columns each occur
within a
<ForeignKeyLink>
element, linking the measure group to the [Customer]
and [Time]
dimensions, respectively.
Let's look at the measures defined in the Sales cube's one and only measure group.
<Measure name="Unit Sales" column="unit_sales" aggregator="sum" formatString="#,###"/>
<Measure name="Store Sales" column="store_sales" aggregator="sum" formatString="#,###.##"/>
<Measure name="Store Cost" column="store_cost" aggregator="sum" formatString="#,###.00"/>
Each measure (see <Measure>
) has a name, a column in the fact table, and an
aggregator
. The aggregator is usually "sum", but "count", "min", "max", "avg", and
"distinct-count" are also allowed; "distinct-count" has some limitations if your cube contains a
parent-child hierarchy.
The optional datatype
attribute
specifies how cell values are represented in Mondrian's cache, and how
they are returned via XML for Analysis. The datatype
attribute can have values "String
",
"Integer
", "Numeric
",
"Boolean
", "Date
",
"Time
", and "Timestamp
".
The default is "Numeric
", except for "count
"
and "distinct-count
" measures, which are
"Integer
".
An optional formatString
attribute specifies how the value is to be printed.
Here, we have chosen to output unit sales with no decimal places (since it is an integer), and store sales
with two decimal places (since it is a currency value). The ',' and '.' symbols are locale-sensitive, so if
you were running in Italian, store sales might appear as "48.123,45". You can achieve even more wild effects
using advanced format strings.
A measure can have a caption attribute to be returned by the
Member.getCaption()
method instead of the name. If you do not specify a caption, it will
default to the measure's name. Defining a specific caption might make
sense if you wish to localize the name of the measure as seen by
end-users, or if you wish to display special letters (e.g. Σ or
Π) in the user interface:
<Measure name="Sum X" column="sum_x" aggregator="sum" caption="Σ X"/>
Rather than coming from a column, a measure can use an SQL
expression to calculate its value. To do this, add a calculated column
promotion_sales
to the declaration of the fact table:
<Table name="sales_fact_1997">
<CalculatedColumnDef name="promotion_sales">
<SQL dialect="generic">
(case when <Column name="promotion_id"/> =
0 then 0 else <Column name="store_sales"/> end)
</SQL>
</ColumnDefs>
</Table>
then create the [Promotion Sales]
measure based upon it:
<Measure name="Promotion Sales" aggregator="sum" column="promotion_sales" formatString="#,###.00">
Observe how the
<PhysicalSchema>
is doing its job and gathering implementation details in one
place. The measure definition does not know or care that the
promotion_sales
column is actually calculated, and looks
the same as a measure defined on a regular column. Every time Mondrian
needs to access the calculated column, it will generate the SQL
expression instead.
In this case, sales are only included in the summation if they
correspond to a promotion sale. Arbitrary SQL expressions can be
used, including subqueries. However, the underlying database must be
able to support that SQL expression in the context of an
aggregate. Variations in syntax between different databases is handled
by specifying more than one <Key>
element:
<Attribute name="Quarter">
<Key>
<Column name="the_year"/>
<Column name="quarter"/>
</Key>
</Attribute>
If there is only one key column then <Key>
and keyColumn
are equivalent. Use whichever you prefer:
the former is more uniform, but the latter requires less typing.
We do not need to specify nameColumn
, because it
defaults to the last column in the composite key.
If a dimension table has a composite key, the key attribute for
that dimension will have a composite key. To reference it from a fact
table, you will need to need to create a <ForeignKeyLink>
in
the fact table's <MeasureGroup>
that uses one column in the
fact table for each of the columns of the dimension table's composite
primary key.
The ordinal property of an attribute controls the order in which
members are displayed. Usually attributes are sorted by name. (If name
is not specified, recall that it will be the key, or the last column
in a composite key.) But sometimes name does not give us the order we
want. The [Time].[Month]
attribute is an example of
this:
<Attribute name="Month">
<Key>
<Column name="the_year"/>
<Column name="month"/>
</Key>
</Attribute>
Like [Time].[Quarter]
, Month has a composite key. We
would like 12 months per year covered by the data set, not just 12
months. Unlike Quarter, we have overridden the name. Let's see what
happens if we execute a query.
SELECT [Time].[Month].Members on ROWS
FROM [Sales];
[Time].[Month].&[2011]&[April]
[Time].[Month].&[2011]&[August]
[Time].[Month].&[2011]&[December]
[Time].[Month].&[2011]&[February]
[Time].[Month].&[2011]&[January]
...
The results look entirely arbitrary, until we remember that Mondrian is sorting by name. We got what we asked for, but not what we wanted! We need to tell Mondrian to sort by the key columns:
<Attribute name="Month" nameColumn="month_name" orderByColumn="month">
<Key>
<Column name="the_year"/>
<Column name="month"/>
</Key>
</Attribute>
Now the results are as we expect:
[Time].[Month].&[2011]&[January]
[Time].[Month].&[2011]&[February]
[Time].[Month].&[2011]&[March]
[Time].[Month].&[2011]&[April]
[Time].[Month].&[2011]&[May]
...
Some combinations of attributes in a dimension are often used together. For example, a business user viewing a state might often wish to expand it to see the constituent cities. When viewing a month, they might wish to roll up into quarter or year. For such combinations of attributes, it is convenient to create hierarchies.
Here is the [Time]
dimension, slightly extended from
the earlier example:
<Dimension name="Time" table="time_by_day" key="Day">
<Attribute name="Year" column="the_year"/>
<Attribute name="Quarter">
<Key>
<Column name="the_year"/>
<Column name="quarter"/>
</Key>
</Attribute>
<Attribute name="Month">
<Key>
<Column name="the_year"/>
<Column name="month_of_year"/>
</Key>
</Attribute>
<Attribute name="Week">
<Key>
<Column name="the_year"/>
<Column name="week_of_year"/>
</Key>
</Attribute>
<Attribute name="Day" column="time_id"/>
</Attributes>
<Hierarchy name="Yearly" hasAll="false">
<Level attribute="Year"/>
<Level attribute="Quarter"/>
<Level attribute="Month"/>
<Level attribute="Day"/>
</Hierarchy>
<Hierarchy name="Weekly" hasAll="false">
<Level attribute="Year"/>
<Level attribute="Week"/>
<Level attribute="Day"/>
</Hierarchy>
</Hierarchies>
</Dimension>
You can see that most of the lines of XML in the dimension definition are taken up by attribute definitions. Once the attributes are defined, you can build a hierarchy by choosing which attributes you want, in which order.
In fact, we recommend that you when you first design a schema, you just define attributes; let the business users use the cube without any hierarchies defined, and let them tell you which hierarchies are needed.
By default, every hierarchy contains a top level called
'(All)
', which contains a single member called
'(All {hierarchyName})
'. This member is parent of
all other members of the hierarchy, and thus represents a grand
total.
The all member is also the default member of the hierarchy; that is, the member that is used for calculating cell values when the hierarchy is not included on an axis or in the slicer.
Various attributes allow you to control the all level and member.
The <Hierarchy>
element's allMemberName
and allLevelName
attributes override the default names of
the all level and all member.
If the <Hierarchy>
element
has hasAll="false"
, the 'all' level is suppressed. The
default member of that dimension will now be the first member of the
first level; for example, in a Time hierarchy, it will be the first
year in the hierarchy. Changing the default member can be confusing,
so you should generally use hasAll="true"
.
You can also set the default member of a hierarchy
explicitly. The defaultMember
attribute should be the
fully-qualified name of the default member.
<Hierarchy name="Yearly" hasAll="false" defaultMember="[Time].[1997].[Q1].[1]">
...
</Hierarchy>
When specified in this way, the default member can even be a calculated member.
Although it is easy to build a hierarchy from existing attributes,
the attributes you include in a hierarchy have to be defined with
care. Each attribute must be functionally dependent on the attribute
of the level below it. Thus, there needs to be one and only
one Quarter
for any given Month
, and one and
only one Year
for any given Quarter
. A
Year-Month-Week-Day
hierarchy would not pass muster,
because some of the some of the days in Week 5 belong to January, and
some to Februrary.
Typically some of the attributes in a hierarchy will have composite
keys, to achieve the appropriate functional dependency relationships.
It is useful to remember that including attributes in a hierarchy does
not alter the number of members of that attribute. In a well-formed
hierarchy, each successive level has more members. For example, in a
Year-Quarter-Month-Day
hierarchy over 10 years, the
levels have 10, 40, 120, 3652 members respectively. If
your Quarter
attribute only has 4 members (because you
have forgotten to give it a composite key) the levels have 10, 4, 120
and 3652 members, and that non-increasing sequence should be a sign
that you have done something wrong.
Surprising as it may seem, the MDX language does not know about attributes. It only knows about dimensions, hierarchies, levels and members. Mondrian gets around that using sleight of hand: it generates a single-level hierarchy for each attribute, called an attribute hierarchy.
Attribute hierarchies do not have any special properties. The effect is as if you manually created the hierarchy. But it saves you the effort of defining lots of hierarchies. The net effect is that you can easily define a dozen or so attributes and start playing with them in queries before you start even thinking about hierarchies.
To control whether an attribute has a hierarchy, use
the hasHierarchy
attribute of the <Attribute>
element. Other attributes, mostly corresponding directly to
similarly-named attributes of the <Hierarchy>
element, are
shown in the following table:
Hierarchy attribute | Attribute attribute | Description |
---|---|---|
N/A | hasHierarchy | Whether an attribute has an attribute-hierarchy. |
name | N/A | The name of the hierarchy. An attribute hierarchy always has the same name as the attribute. |
hasAll | hierarchyHasAll | Whether the hierarchy has an 'all' level and member. |
allMemberName | hierarchyAllMemberName | Name of the 'all' member. If not specified, the all member is called 'All <hierarchyName>'. |
allMemberCaption | hierarchyAllMemberCaption | Caption of the 'all' member. If not specified, the member's caption is the same as its name. |
allLevelName | hierarchyAllLevelName | Name of the 'all' level. If not specified, the all level is called '(All)'. |
defaultMember | hierarchyDefaultMember | The default member of the hierarchy. If not specified, the default member is the first member of the first level. (This is the 'all' member, if present.) |
We recommend that you think twice before disabling the 'all' member of an attribute hierarchy, or changing its default member, as this will cause hidden constraints to be in play even when the attribute is not being displayed on an axis.
Attributes are a new feature in Mondrian version 4. In previous versions of Mondrian, you could define what are now called "attribute hierarchies", but you would need to define some fairly verbose XML. Also, the MDX syntax was clunky if you had more than one hierarchy in a given dimension. As a result, schema designers would typically define hierarchies for the 'obviously' hierarchical dimensions like Time and Geography but not expose each level as an attribute that could be used to slice and dice independently. Attributes were an afterthought in most schemas.
From Mondrian 4 onwards, we encourage schema designers to design dimensions with many attributes. Don't worry about creating hierarchies at first. See which combinations of attributes your end-users regularly use together, and consider creating hierarchies to make those drill paths more convenient. But we expect that your end-users will still use the standalone attributes much of the time.
Some attributes have "within parent" and "without parent"
forms. For example, the [Time].[Month]
attribute has 120
members over 10 years, whereas the [Time].[Month of Year]
has just 12 members. The first can be used to compare whether
toboggans sold better in December 2012 sales than in December 2011;
the second can be used to compare whether toboggans sell better in
December than April. You need to define two separate attributes.
There is no easy way to define one in terms of the other, or an
automatic mapping between them. The best you can do is to use a naming
convention, such as "Attribute of Parent", for all such
attributes, so that the correspondence is clear to your end-users.
XML is a suitable language for defining schemas because it can be read and written by both humans and machines. You can hand-write a schema in emacs, vi or notepad, or you can write one in a modeling tool. However, XML can be verbose. This is less of a problem for tools than for humans typing every character.
Therefore Mondrian provides various short cuts. These are alternative ways of specifying constructs that are less verbose. They reduce the amount of typing, and make sure that simple things look simple, without reducing the power of Mondrian's schema language.
One recurring shorthand is to allow an attribute to be used instead of a collection of nested elements, if that collection has only one element.
For example, if you are defining an attribute with a simple (non-composite) key you could either write
<Attribute name="A">
<Key>
<Column name="c">
</Key>
</Attribute>
or
<Attribute name="A" column="c"/>
These are equivalent, but the second is more concise, and you would
probably choose to use it when writing a schema by hand. If the
attribute has a composite key, or if you wish to use
the table
attribute, you have to use the nested
<Key>
element:
<Attribute name="A">
<Key>
<Column table="t1" name="c1"/>
<Column table="t2" name="c2"/>
</Key>
</Attribute>
The nested collection in this case is <Key>
and a set of
<Column>
elements, and the attribute
is keyColumn
. But the pattern occurs elsewhere in the
schema, as shown by the following table.
Parent element | Attribute | Equivalent nested element | Description |
<Attribute> |
keyColumn |
<Key> |
Column(s) that comprise the key of this attribute. |
<Attribute> |
nameColumn |
<Name> |
Column that defines the name of members of this attribute. If not specified, the attribute key is used. |
<Attribute> |
orderByColumn |
<OrderBy> |
Column(s) that define the order of members. If not specified, the attribute key is used. |
<Attribute> |
captionColumn |
<Caption> |
Column that forms the caption of members. If not specified, the attribute name is used. |
<Measure> |
column |
<Arguments> |
Column(s) that are the arguments to this measure. (When this measure is implemented by generating a SQL expression, these columns become the argument to the SQL aggregate function.) |
<Table> |
keyColumn |
<Key> |
Column(s) that form this Table's key. |
<Link> |
foreignKeyColumn |
<ForeignKey> |
Column(s) that form a foreign key from this Link's referencing table to its referenced table. |
<ForeignKeyLink> |
foreignKeyColumn |
<ForeignKey> |
Column(s) that form a foreign key from a measure group's fact table to a dimension table. |
The table
attribute occurs on <Dimension>
,
<Attribute>
and <Column>
elements and, if not
present, is inherited from the enclosing element. This makes
definitions more concise if, for example, a dimension is based on a
single table.
Many attributes have default values. For example, the default value
of the type
attribute of a <Dimension>
element
is "OTHER". Default values are described in the schema reference.
If several cubes in the same schema use dimensions that have similar definitions, consider defining a shared dimension.
This section contains various topics about dimensions.
Measures
dimension Measures are treated as members of a special dimension, called 'Measures'. The dimension has a single hierarchy and a single level.
MDX allows you to omit the hierarchy name if a dimension only has
one hierarchy (including attribute
hierarchies). Therefore [Measures].[Unit Sales]
is a
valid shorthand for [Measures].[Measures].[Unit Sales]
.
That measures belong to a dimension might seem strange. (If you come from a background in relational databases, it seems like declaring that every column is a row.) But it one of the defining properties of the dimensional model, and one of its strengths. Because measures are members, you can change the context from one measure to another in a calculation just as easily as you can time period or sales region. It allows for greater reuse of calculations. It also makes for easier access control, because you can grant on a cell, of which measure is just one coordinate.
The dimensions we have seen so far have been based on a single table. Such dimensions are called star dimensions, because they are arranged around a fact table like the points of a star. It is also possible to have snowflake dimensions, which are based upon two or more dimension tables.
First, make sure that every table in your snowflake dimension is
defined in the <PhysicalSchema>
. You will be referencing
tables by their table usage name. The table usage name is unique
within the schema. It is usually the same as the table name, but can
be specified explicitly using the <Table>
element's alias
attribute if there are two tables of the
same name, and must be specified explicitly for <Query>
and
<InlineTable>
elements.
Next, make sure that there is a path between the tables defined by
one of more <Link>
elements. Usually there will be a direct
link. For example, here are the "product"
and "product_class"
tables that we wish to build
the [Product]
dimension on:
<Table name='product' keyColumn='product_id'/>
<Table name='product_class' keyColumn='product_class_id'/>
<Link target='product' source='product_class' foreignKeyColumn='product_class_id'/>
Mondrian will give an error if it cannot find a path, or if there is more than one path.
Now, define the dimension:
<Dimension name='Product' table='product' key='Product Id'>
<Attribute name='Product Family' table='product_class' keyColumn='product_family'/>
<Attribute name='Product Department' table='product_class'>
<Key>
<Column name='product_family'/>
<Column name='product_department'/>
</Key>
</Attribute>
<Attribute name='Brand Name' table='product_class'>
<Key>
<Column name='product_family'/>
<Column name='product_department'/>
<Column table='product_class' name='brand_name'/>
</Key>
</Attribute>
<Attribute name='Product Name' table='product' keyColumn='product_id' nameColumn='product_name'/>
<Attribute name='Product Id' table='product' keyColumn='product_id'/>
</Attributes>
</Dimension>
The table
attribute occurs in several places in this
example. It can occur on <Dimension>
, <Attribute>
and <Column>
elements. Elements that do not specify a table
inherit their parent element's table. The
<Dimension name='Product' table='product' key='Product Id'>
element establishes a default table of product
for the
dimension.
The <Attribute name='Brand Name' table='product_class'>
element overrides, and its
<Column table='product_class' name='brand_name'>
child
element overrides again.
Think carefully before introducing a snowflake
dimension. Normalizing the dimension table into multiple tables may
save some disk space (although very little on a column-oriented
database or any analytic database with good compression) but
introduces another join, and the cost of that join often outweighs any
benefits. One reasonable case for snowflake dimensions is where you
are heavily using aggregate tables at the granularity of the outer
table. (In the example, this would occur if a lot of important queries
involved the [Product].[Brand Name]
attribute and you
decided to create an aggregate table at that
level. Because product_class
is a separate table, you
could join to it to get columns such as product_family
and product_department
, and you would not need to include
them in the aggregate table.)
Time dimensions based on year/month/week/day are coded differently in the Mondrian schema due to the MDX time related functions such as:
ParallelPeriod([level[, index[, member]]])
PeriodsToDate([level[, member]])
WTD([member])
MTD([member])
QTD([member])
YTD([member])
LastPeriod(index[, member])
Time dimensions have type="TimeDimension"
. The role of an attribute in a time dimension is
indicated by the attributes's levelType
attribute, whose allowable values are as follows:
levelType value |
Meaning |
TimeYears | Level is a year |
TimeHalfYear | Level is a half year |
TimeQuarters | Level is a quarter |
TimeMonths | Level is a month |
TimeWeeks | Level is a week |
TimeDays | Level represents days |
TimeHours | Level represents hours |
TimeMinutes | Level represents minutes |
TimeSeconds | Level represents seconds |
Here is an example of a time dimension:
<Dimension name="Time" type="TimeDimension">
<Hierarchy hasAll="true" allMemberName="All Periods" primaryKey="dateid">
<Table name="datehierarchy"/>
<Level name="Year" column="year" uniqueMembers="true" levelType="TimeYears" type="Numeric"/>
<Level name="Quarter" column="quarter" uniqueMembers="false" levelType="TimeQuarters"/>
<Level name="Month" column="month" uniqueMembers="false" ordinalColumn="month" nameColumn="month_name" levelType="TimeMonths" type="Numeric"/>
<Level name="Week" column="week_in_month" uniqueMembers="false" levelType="TimeWeeks"/>
<Level name="Day" column="day_in_month" uniqueMembers="false" ordinalColumn="day_in_month" nameColumn="day_name" levelType="TimeDays" type="Numeric"/>
</Hierarchy>
</Dimension>
Member properties are defined by the <Property>
element
within an <Attribute>
, like this:
<Attribute name="City" keyColumn="city_id">
<Property attribute="Country"/>
<Property attribute="State"/>
<Property attribute="City Population" name="Population"/>
</Attribute>
We are defining the [City]
attribute and three
properties. Each property is defined in terms of another attribute in
the same dimension. The State
and Country
properties inherit the name of the attribute they are defined from;
the property created from the City Population
attribute
has an explicit name
attribute to rename it
to Population
.
Since properties are defined from attributes, they are not simple values. They have a key, a name, a caption, and a sort order, just like attributes.
The attribute used to define a property must be functionally
dependent. It would be illegal to define a property based on
the [Zipcode]
attribute, for instance, because there
might be more than one zipcode in a given city. But each city has only
one value for state, country, and population.
You can also supply a nested <PropertyFormatter>
element
define a property formatter, which
is explained later.
Once properties have been defined in the schema, all members of
that attribute will have those properties. (This applies to members of
its attribute hierarchy, and also members of a level of an explicit
hierarchy if that level is based on that attribute.) In an MDX query,
you can access those properties via the
member.Properties("propertyName")
function,
for example:
SELECT {[Measures].[Store Sales]} ON COLUMNS,
TopCount(
Filter(
[Customer].[City].Members,
[Customer].[City].CurrentMember.Properties("Population") < 10000),
10,
[Measures].[Store Sales]) ON ROWS
FROM [Sales]
Mondrian deduces the type of the property expression, if it can. If the property name is a
constant string, the type is based upon the type
attribute ("String", "Numeric" or "Boolean")
of the property definition. If the property name is an expression (for example,
CurrentMember.Properties("Popu" + "lation")
), Mondrian will return an untyped
value.
Notice in the time hierarchy example above the ordinalColumn
and
nameColumn
attributes on the <Level>
element. These
affect how levels are displayed in a result. The ordinalColumn
attribute specifies a
column in the Hierarchy table that provides the order of the members in a given level, while the
nameColumn
specifies a column that will be displayed.
For example, in the Month level above, the datehierarchy
table has month (1 .. 12)
and month_name (January, February, ...) columns. The column value that will be used internally within MDX is the
month column, so valid member specifications will be of the form:
[Time].[2005].[Q1].[1]
. Members of the [Month]
level will be displayed in the order January, February, etc.
In a parent-child hierarchy, members are always sorted in hierarchical
order. The ordinalColumn
attribute controls the order that
siblings appear within their parent.
Ordinal columns may be of any datatype which can legally be used in
an ORDER BY clause. Scope of ordering is per-parent, so in the
example above, the day_in_month column should cycle for each month.
Values returned by the JDBC driver should be non-null instances of
java.lang.Comparable
which yield the desired ordering when their
Comparable.compareTo
method is called.
Levels contain a type
attribute, which can have values "String
", "Integer
", "Numeric
", "Boolean
",
"Date
", "Time
", and "Timestamp
".
The default value is "Numeric"
because key columns generally have a numeric type. If it is a
different type, Mondrian needs to know this so it can generate SQL statements
correctly; for example, string values will be generated enclosed in single
quotes:
WHERE productSku = '123-455-AA'
A degenerate dimension is a dimension which is so simple that it isn't worth creating its own dimension table. For example, consider following the fact table:
product_id | time_id | payment_method | customer_id | store_id | item_count | dollars |
55 | 20040106 | Credit | 123 | 22 | 3 | $3.54 |
78 | 20040106 | Cash | 89 | 22 | 1 | $20.00 |
199 | 20040107 | ATM | 3 | 22 | 2 | $2.99 |
55 | 20040106 | Cash | 122 | 22 | 1 | $1.18 |
and suppose we created a dimension table for the values in the payment_method
column:
payment_method |
Credit |
Cash |
ATM |
This dimension table is fairly pointless. It only has 3 values, adds no additional information, and incurs the cost of an extra join.
Instead, you can create a degenerate dimension. To do this, declare a dimension without a table, and Mondrian will assume that the columns come from the fact table.
<Cube name="Checkout">
<!-- The fact table is always necessary. -->
<Table name="checkout">
<Dimension name="Payment method">
<Hierarchy hasAll="true">
<!-- No table element here. Fact table is assumed. -->
<Level name="Payment method" column="payment_method" uniqueMembers="true"/>
</Hierarchy>
</Dimension>
<!-- other dimensions and measures -->
</Cube>
Note that because there is no join, the foreignKey
attribute of
Dimension
is not necessary, and the Hierarchy
element has no <Table>
child element or
primaryKey
attribute.
The <Level>
element allows specifying the optional attribute approxRowCount
. Specifying
approxRowCount
can improve performance by reducing the need to determine level, hierarchy, and dimension cardinality.
This can have a significant impact when connecting to Mondrian via XMLA.
The <Cube>
element allows specifying the
optional attribute "defaultMeasure".
Specifying defaultMeasure
in the <Cube>
element allows users
to explicitly specify any base measure as a default Measure.
Note that if a default measure is not specified it takes the first measure defined in the cube as the default measure.
Specifying the defaultMeasure
explicitly would be useful in
cases where you would want a calculated member to be picked up as the default
measure.
<Cube name="Sales" defaultMeasure="Unit Sales">
...
<CalculatedMember name="Profit" dimension="Measures">
...
</Cube>
In some circumstances, it may be possible to optimize performance by taking advantage of known functional dependencies in the data being processed. Such dependencies are typically the result of business rules associated with the systems producing the data, and often cannot be inferred just by looking at the data itself.
Functional dependencies are declared to Mondrian using the
dependsOnLevelValue
attribute of the
<Property>
element and the
uniqueKeyLevelName
attribute of the <Hierarchy>
element.
The dependsOnLevelValue
attribute of a
member property is used to indicate
that the value of the member property is functionally dependent on the value
of the <Level>
in which the member property is
defined. In other words, for a given value of the level, the value of the
property is invariant.
The uniqueKeyLevelName
attribute of a
<Hierarchy>
is used to indicate that the given
level (if any) taken together with all higher levels in the hierarchy acts as a
unique alternate key, ensuring that for any unique combination of those level values,
there is exactly one combination of values for all levels below it.
To illustrate, consider a hierarchy modeling cars built and licensed in the United States:
<Dimension name="Automotive" foreignKey="auto_dim_id">
<Hierarchy hasAll="true" primaryKey="auto_dim_id" uniqueKeyLevelName="Vehicle Identification Number">
<Table name="automotive_dim"/>
<Level name="Make" column="make_id" type="Numeric"/>
<Level name="Model" column="model_id" type="Numeric"/>
<Level name="ManufacturingPlant" column="plant_id" type="Numeric"/>
<Property name="State" column="plant_state_id" type="Numeric" dependsOnLevelValue="true"/>
<Property name="City" column="plant_city_id" type="Numeric" dependsOnLevelValue="true"/>
<Level name="Vehicle Identification Number" column="vehicle_id" type="Numeric"/>
<Property name="Color" column="color_id" type="Numeric" dependsOnLevelValue="true"/>
<Property name="Trim" column="trim_id" type="Numeric" dependsOnLevelValue="true"/>
<Level name="LicensePlateNum" column="license_id" type="String"/>
<Property name="State" column="license_state_id" type="Numeric" dependsOnLevelValue="true"/>
</Hierarchy>
</Dimension>
In the above example, we know that a given manufacturing plant exists only in a single city and state, that a given car only has one color scheme and one trim level, and that the license number is associated with a single state. Therefore, we can state that all of these member properties are functionally dependent on the associated level values.
Additionally, we know that the Vehicle Identification Number uniquely identifies each car, and that each car only has one license. Thus, we know that the combination of Make, Model, Manufacturing Plant, and Vehicle Identification Number uniquely identifies each vehicle; the license number is redundant.
These attributes enable optimization of the GROUP BY clause in the SQL statements Mondrian generates. Absent any functional dependency information, a typical query on the Automotive dimension would look something like:
SELECT
`automotive_dim`.`make_id` AS c0,
`automotive_dim`.`model_id` AS c1,
`automotive_dim`.`plant_id` AS c2,
`automotive_dim`.`plant_state_id` AS c3,
`automotive_dim`.`plant_city_id` AS c4,
`automotive_dim`.`vehicle_id` AS c5,
`automotive_dim`.`color_id` AS c6,
`automotive_dim`.`trim_id` AS c7,
`automotive_dim`.`license_id` AS c8,
`automotive_dim`.`license_state_id` AS c9
FROM
`automotive_dim` AS `automotive_dim`,
GROUP BY
`automotive_dim`.`make_id`,
`automotive_dim`.`model_id`,
`automotive_dim`.`plant_id`,
`automotive_dim`.`plant_state_id`,
`automotive_dim`.`plant_city_id`,
`automotive_dim`.`vehicle_id`,
`automotive_dim`.`color_id`,
`automotive_dim`.`trim_id`,
`automotive_dim`.`license_id`,
`automotive_dim`.`license_state_id`
ORDER BY
`...
Given the functional dependence attributes in the schema example above, however, we know that the query is selecting at a depth that includes the "unique key" level, and that all properties in the query are also functionally dependent on their levels. In such cases the GROUP BY clause is redundant and may be eliminated completely, increasing SQL query performance significantly on some databases:
SELECT
`automotive_dim`.`make_id` AS c0,
`automotive_dim`.`model_id` AS c1,
`automotive_dim`.`plant_id` AS c2,
`automotive_dim`.`plant_state_id` AS c3,
`automotive_dim`.`plant_city_id` AS c4,
`automotive_dim`.`vehicle_id` AS c5,
`automotive_dim`.`color_id` AS c6,
`automotive_dim`.`trim_id` AS c7,
`automotive_dim`.`license_id` AS c8,
`automotive_dim`.`license_state_id` AS c9
FROM
`automotive_dim` AS `automotive_dim`,
ORDER BY
`...
Had the query not been deep enough to include the "unique key" level, or had any of the member properties not been functionally dependent on their level, this optimization would not be possible.
In some cases, a different optimization can be made where there is no "unique key" level, but some or all of the member properties are functionally dependent on their level. Some databases (notably MySQL) permit columns to be listed in the SELECT clause that do not also appear in the GROUP BY clause. On such databases, Mondrian can simply leave the functionally dependent member properties out of the GROUP BY, which may reduce SQL query processing time substantially:
SELECT
`automotive_dim`.`make_id` AS c0,
`automotive_dim`.`model_id` AS c1,
`automotive_dim`.`plant_id` AS c2,
`automotive_dim`.`plant_state_id` AS c3,
`automotive_dim`.`plant_city_id` AS c4,
`automotive_dim`.`vehicle_id` AS c5,
`automotive_dim`.`color_id` AS c6,
`automotive_dim`.`trim_id` AS c7,
`automotive_dim`.`license_id` AS c8,
`automotive_dim`.`license_state_id` AS c9
FROM
`automotive_dim` AS `automotive_dim`,
GROUP BY
`automotive_dim`.`make_id`,
`automotive_dim`.`model_id`,
`automotive_dim`.`plant_id`,
`automotive_dim`.`vehicle_id`,
`automotive_dim`.`license_id`,
ORDER BY
`...
Please note that significant changes are anticipated to the schema syntax in Mondrian 4.0, including a new approach to declaring functional dependencies. While the expectation is that the 4.0 schema processor will maintain backward compatibility with schemas developed for Mondrian 3.1, these are transitional attributes introduced to allow support in the interim, and 4.0 will not be backward compatible with them. Therefore, any schema using these attributes will need to be migrated to the new syntax as part of upgrading to Mondrian 4.0.
A physical schema defines a set of tables, columns and relationships between them. The relationships are called links.
The goal of the physical schema is to isolate the logical schema (cubes, dimensions, measures and so forth) from the actual database. For example, a measure can be based upon a column and does not need to know whether it is a real column or a column calculated using a SQL expression.
Because the isolation layer exists, you can change the implementation. For example, you could change a calculated column into a real column, or change a query or inline table into a real table, without changing the logical schema.
A table is a named use of a database table.
Tables are declared using
the <Table>
element. If
the fact table is not in the default schema, you can provide an
explicit schema using the "schema" attribute; for example:
<Table schema="Foodmart" name="sales_fact_1997"/>
Inside a table, you can optionally define <Column>
elements. For example:
<Table name='customer'>
<ColumnDef name='customer_id' type='Integer' internalType='int'/>
<ColumnDef name='fname'/>
<ColumnDef name='lname'/>
<CalculatedColumnDef name='full_name' type='String'>
<SQL dialect='mysql'>
</SQL>
<SQL dialect='generic'>
<Column name='fullname'/>
</SQL>
</ColumnDefs>
</Table>
If you do not include a <ColumnDefs>
section, Mondrian
will read column definitions from JDBC. Such single-line table
definitions are adequate for most purposes.
The
<ColumnDef name='customer_id' type='Integer' internalType='int'>
declares that the customer
table has
a customer_id
column and that we want to use it as an
integer. This affects how the column is sorted, and affects the MDX
type of expressions built from this column. The underlying column
might have a different type such as FLOAT
or VARCHAR(20)
.
The attribute internalType='int'
tells Mondrian how to
represent the value internally. The value int
means that
Mondrian will use getInt
JDBC calls to access the values,
and store them using Java int
values.
The next two <ColumnDef>
elements just make explicit the
fact that we expect the fname
and lname
columns to exist.
The <CalculatedColumnDef>
defines a column using a SQL
expression. Note that you can include SQL for multiple
dialects. Mondrian will choose the right SQL expression for the
current backend database. (This is very useful when defining a
packaged application that needs to run unchanged on multiple backend
databases.)
The SQL can include embedded <Column>
elements. Mondrian
replaces these with references to columns, qualifying with a table
alias and quoting identifiers appropriately for the current SQL
dialect. (This might become `customer`.`customer_id`
on
MySQL, or "customer"."customer_id"
on
Oracle.
You can also use the <Query>
construct to build more complicated SQL statements.
The <InlineTable>
construct allows
you to define a dataset in the schema file. You must declare the names of the columns, the column types
("String" or "Numeric"), and a set of rows. As for
<Table>
and
<View>
, you must provide a unique alias with which
to refer to the dataset.
Here is an example:
<Dimension name="Severity">
<Hierarchy hasAll="true" primaryKey="severity_id">
<InlineTable alias="severity">
<ColumnDef name="id" type="Numeric"/>
<ColumnDef name="desc" type="String"/>
</ColumnDefs>
<Rows>
<Row>
</Row>
<Row>
</Row>
<Row>
</Row>
</Rows>
</InlineTable>
<Level name="Severity" column="id" nameColumn="desc" uniqueMembers="true"/>
</Hierarchy>
</Dimension>
This has the same effect as if you had a table called 'severity' in your database:
id | desc |
1 | High |
2 | Medium |
3 | Low |
and the declaration
<Dimension name="Severity">
<Hierarchy hasAll="true" primaryKey="severity_id">
<Table name="severity"/>
<Level name="Severity" column="id" nameColumn="desc" uniqueMembers="true"/>
</Hierarchy>
</Dimension>
To specify a NULL value for a column, omit the <Value>
for that column, and the column's value will default to NULL.
A <Query>
element defines a table by executing a SQL
statement. It is similar to a view definition in an RDBMS. As for
<CalculatedColumnDef>
, you can provide a different SQL
expression for different SQL dialects.
Example:
<Query name='american_customers'>
<SQL dialect='generic'>
SELECT * FROM customer WHERE country = 'USA'
</SQL>
</Query>
A <Link>
defines a relationship between two tables. The
relationship is directed, in the same sense as a foreign key. It is
used as an automatic join path when the tables are used in
a snowflake
dimension.
Here is how to define a link between the infamous emp
and dept
tables.
<Table name='emp'>
<Key>
<Column name='empno'/>
</Key>
</Table>
<Table name='dept'>
<Key>
<Column name='deptno'/>
</Key>
</Table>
<Link target='emp' source='dept'>
<Column name='deptno'/>
</ForeignKey>
</Link>
And here is a shorter, equivalent, version
using keyColumn
and foreignKeyColumn
attributes. These short cuts are
possible because the keys are not composite.
<Table name='emp' keyColumn='empno'/>
<Table name='dept' keyColumn='deptno'/>
<Link target='emp' source='dept' foreignKeyColumn='deptno'/>
Some other table-to-table connections in Mondrian schemas do not
automatically use links. For instance, you need to provide an explicit
<ForeignKeyLink>
to connect the sales_fact_1997
fact table to the customer
dimension table.
(Rationale as follows. If we used links for purposes such as connecting fact to dimension tables, many schemas would become cyclic or ambiguous (there might be multiple paths between various tables). But we still think it is useful to be able define links in a diagram, as a guideline for what join paths make sense. After all, doesn't everyone wish there were a schema diagram for every schema? A schema modeler would, we hope, include a diagramming tool, and if links are present, it would suggest that they are used when defining connections in the schema.)
Mondrian supports a limited set of database-specific hints for the
<Table>
element, which will then be
passed on to SQL queries involving the table. These hints are as
follows:
Database | Hint Type | Permitted Values | Description |
MySQL | force_index |
The name of an index on this table | Forces the named index to be used when selecting level values from this table. |
For example:
<Table name="automotive_dim">
</Table>
As with the functional dependency optimizations, support for table hints is in a transitional stage, and are likely to change in Mondrian 4.0. Any schema using them may need to be migrated to the new schema syntax as part of upgrading to Mondrian 4.0.
We saw earlier how to build a cube based upon a fact table, and dimensions in the fact table ("Payment method") and in a table joined to the fact table ("Gender"). This is the most common kind of mapping, and is known as a star schema.
But a dimension can be based upon more than one table, provided that there is a well-defined
path to join these tables to the fact table. This kind of dimension is known as a snowflake,
and is defined using the <Join>
operator. For example:
<Cube name="Sales">
...
<Dimension name="Product" foreignKey="product_id">
<Hierarchy hasAll="true" primaryKey="product_id" primaryKeyTable="product">
<Join leftKey="product_class_key" rightAlias="product_class" rightKey="product_class_id">
<Table name="product"/>
<Join leftKey="product_type_id" rightKey="product_type_id">
<Table name="product_class"/>
<Table name="product_type"/>
</Join>
</Join>
<!-- Level declarations ... -->
</Hierarchy>
</Dimension>
</Cube>
This defines a "Product"
dimension consisting of three tables. The
fact table joins to "product"
(via the foreign key
"product_id"
), which joins to "product_class"
(via the foreign
key "product_class_id"
), which joins to "
product_type"
(via the foreign key "product_type_id"
). We require
a <Join>
element nested within a <Join>
element because <Join>
takes two operands; the operands
can be tables, joins, or even queries.
The arrangement of the tables seems complex; the simple rule of thumb is to order the tables
by the number of rows they contain. The "product"
table has the most
rows, so it joins to the fact table and appears first; "product_class"
has fewer rows, and "product_type"
, at the tip of the snowflake, has
least of all.
Note that the outer <Join>
element has a
rightAlias
attribute. This is necessary because the right component of the join (the inner
<Join>
element) consists of more than one table. No
leftAlias
attribute is necessary in this case, because the leftKey
column unambiguously comes from the "product"
table.
When generating the SQL for a join, Mondrian needs to know which column to join to. If you are joining to a join, then you need to tell it which of the tables in the join that column belongs to (usually it will be the first table in the join).
Because shared dimensions don't belong to a cube, you have to give them an explicit table
(or other data source). When you use them in a particular cube, you specify the foreign key. This
example shows the Store Type
dimension being joined to the
Sales
cube using the sales_fact_1997.store_id
foreign key, and to the Warehouse
cube using the
warehouse.warehouse_store_id
foreign key:
<Dimension name="Store Type">
<Hierarchy hasAll="true" primaryKey="store_id">
<Table name="store"/>
<Level name="Store Type" column="store_type" uniqueMembers="true"/>
</Hierarchy>
</Dimension>
<Cube name="Sales">
<Table name="sales_fact_1997"/>
...
<DimensionUsage name="Store Type" source="Store Type" foreignKey="store_id"/>
</Cube>
<Cube name="Warehouse">
<Table name="warehouse"/>
...
<DimensionUsage name="Store Type" source="Store Type" foreignKey="warehouse_store_id"/>
</Cube>
A virtual cube combines two or more regular cubes. It is defined by the <VirtualCube>
element:
<VirtualCube name="Warehouse and Sales">
<CubeUsage cubeName="Sales" ignoreUnrelatedDimensions="true"/>
<CubeUsage cubeName="Warehouse"/>
</CubeUsages>
<VirtualCubeDimension cubeName="Sales" name="Customers"/>
<VirtualCubeDimension cubeName="Sales" name="Education Level"/>
<VirtualCubeDimension cubeName="Sales" name="Gender"/>
<VirtualCubeDimension cubeName="Sales" name="Marital Status"/>
<VirtualCubeDimension name="Product"/>
<VirtualCubeDimension cubeName="Sales" name="Promotion Media"/>
<VirtualCubeDimension cubeName="Sales" name="Promotions"/>
<VirtualCubeDimension name="Store"/>
<VirtualCubeDimension name="Time"/>
<VirtualCubeDimension cubeName="Sales" name="Yearly Income"/>
<VirtualCubeDimension cubeName="Warehouse" name="Warehouse"/>
<VirtualCubeMeasure cubeName="Sales" name="[Measures].[Sales Count]"/>
<VirtualCubeMeasure cubeName="Sales" name="[Measures].[Store Cost]"/>
<VirtualCubeMeasure cubeName="Sales" name="[Measures].[Store Sales]"/>
<VirtualCubeMeasure cubeName="Sales" name="[Measures].[Unit Sales]"/>
<VirtualCubeMeasure cubeName="Sales" name="[Measures].[Profit Growth]"/>
<VirtualCubeMeasure cubeName="Warehouse" name="[Measures].[Store Invoice]"/>
<VirtualCubeMeasure cubeName="Warehouse" name="[Measures].[Supply Time]"/>
<VirtualCubeMeasure cubeName="Warehouse" name="[Measures].[Units Ordered]"/>
<VirtualCubeMeasure cubeName="Warehouse" name="[Measures].[Units Shipped]"/>
<VirtualCubeMeasure cubeName="Warehouse" name="[Measures].[Warehouse Cost]"/>
<VirtualCubeMeasure cubeName="Warehouse" name="[Measures].[Warehouse Profit]"/>
<VirtualCubeMeasure cubeName="Warehouse" name="[Measures].[Warehouse Sales]"/>
<VirtualCubeMeasure cubeName="Warehouse" name="[Measures].[Average Warehouse Sale]"/>
<CalculatedMember name="Profit Per Unit Shipped" dimension="Measures">
</VirtualCube>
The <CubeUsages>
element is optional. It specifies the cubes that are imported into the virtual cube.
Holds CubeUsage elements.
The <CubeUsage>
element is optional. It specifies the base cube that is imported into the
virtual cube. Currently it is possible to define a VirtualCubeMeasure and
similar imports from a base cube without defining CubeUsage for the cube.
The cubeName
attribute specifies the base cube being imported.
The ignoreUnrelatedDimensions
attribute specifies that the measures
from this base cube will have non joining dimension members pushed to the
top level member. This behaviour is currently supported for aggregation.
This attribute is by default false.
ignoreUnrelatedDimensions
is an experimental feature similar to
the similarly named feature in SSAS 2005.
MSDN documentation
mentions "When IgnoreUnrelatedDimensions is true, unrelated dimensions are forced
to their top level; when the value is false, dimensions are not forced to their
top level. This property is similar to the Multidimensional Expressions
(MDX) ValidMeasure function". Current Mondrian implementation of
ignoreUnrelatedDimensions
depends on use of ValidMeasure. E.g. If we
want to apply this behaviour to "Unit Sales" measure in the "Warehouse and Sales"
virtual cube then we need to define a CubeUsage entry for "Sales" cube as shown
in the example above and also wrap this measure with ValidMeasure.
The <VirtualCubeDimension>
element imports a dimension from one of the constituent cubes. If you do not
specify the cubeName
attribute, this means you
are importing a shared dimension. (If a shared dimension is used more than once
in a cube, there is no way, at present, to disambiguate which usage of the
shared dimension you intend to import.)
The <VirtualCubeMeasure>
element imports a measure from one of the constituent cubes. It is imported with
the same name. If you want to create a formula, or just to rename a measure as
you import it, use the <CalculatedMember>
element.
Virtual cubes occur surprisingly frequently in real-world applications. They occur when you have fact tables of different granularities (say one measured at the day level, another at the month level), or fact tables of different dimensionalities (say one on Product, Time and Customer, another on Product, Time and Warehouse), and want to present the results to an end-user who doesn't know or care how the data is structured.
Any common dimensions -- shared dimensions which are used by both constituent
cubes -- are automatically synchronized. In this example, [Time]
and [Product]
are common dimensions. So if the context is ([Time].[1997].[Q2]
,
[Product].[Beer].[Miller Lite]
), measures from either cube will
relate to this context.
Dimensions which only belong to one cube are called non-conforming
dimensions. The [Gender]
dimension is an example of this: it exists
in the Sales
cube but not Warehouse
. If the context is
([Gender].[F]
, [Time].[1997].[Q1]
), it makes sense to
ask the value of the [Unit Sales]
measure (which comes from the
[Sales]
cube) but not the [Units Ordered]
measure (from
[Warehouse]
). In the context of [Gender].[F]
, [Units
Ordered]
has value NULL.
A conventional hierarchy has a rigid set of levels, and members which adhere to those
levels. For example, in the Product
hierarchy, any member of the Product Name
level has a parent in the Brand Name
level, which has a parent in the
Product Subcategory
level, and so forth. This structure is sometimes too rigid
to model real-world data.
A parent-child hierarchy has only one level (not counting the special 'all' level),
but any member can have parents in the same level. A classic example is the reporting structure
in the Employees
hierarchy:
<Dimension name="Employees" foreignKey="employee_id">
<Hierarchy hasAll="true" allMemberName="All Employees" primaryKey="employee_id">
<Table name="employee"/>
<Level name="Employee Id" uniqueMembers="true" type="Numeric" column="employee_id" nameColumn="full_name" parentColumn="supervisor_id" nullParentValue="0">
<Property name="Marital Status" column="marital_status"/>
<Property name="Position Title" column="position_title"/>
<Property name="Gender" column="gender"/>
<Property name="Salary" column="salary"/>
<Property name="Education Level" column="education_level"/>
<Property name="Management Role" column="management_role"/>
</Level>
</Hierarchy>
</Dimension>
The important attributes here are parentColumn
and nullParentValue
:
parentColumn
attribute is the name of the
column which links a member to its parent member; in this case, it is
the foreign key column which points to an employee's supervisor. The <ParentExpression>
child element of <Level>
is equivalent to the parentColumn
attribute, but allows you to define an arbitrary SQL expression, just
like the <Expression>
element. The parentColumn
attribute (or <ParentExpression>
element) is the
only indication to Mondrian that a hierarchy has a parent-child
structure.nullParentValue
attribute is the value which
indicates that a member has no parent. The default is nullParentValue="null"
,
but since many database don't index null values, schema designers
sometimes use values as the empty string, 0, and -1 instead.There's one serious problem with the parent-child hierarchy defined above, and that is the amount of work Mondrian has to do in order to compute cell-totals. Let's suppose that the employee table contains the following data:
employee | ||
supervisor_id | employee_id | full_name |
null | 1 | Frank |
1 | 2 | Bill |
2 | 3 | Eric |
1 | 4 | Jane |
3 | 5 | Mark |
2 | 6 | Carla |
If we want to compute the total salary budget for Bill, we need to add in the salaries of Eric
and Carla (who report to Bill) and Mark (who reports to Eric). Usually Mondrian generates a
SQL GROUP BY
statement to compute these totals, but there is no
(generally available) SQL construct which can traverse hierarchies. So by default,
Mondrian generates one SQL statement per supervisor, to retrieve and total all of that
supervisor's direct reports.
This approach has a couple of drawbacks. First, the performance is not very good if a hierarchy contains more than a hundred members. Second, because Mondrian implements the distinct-count aggregator by generating SQL, you cannot define a distinct-count measure in any cube which contains a parent-child hierarchy.
How can we solve these problems? The answer is to enhance the data so that Mondrian is able to retrieve the information it needs using standard SQL. Mondrian supports a mechanism called a closure table for this purpose.
A closure table is a SQL table which contains a record for every employee/supervisor
relationship, regardless of depth. (In mathematical terms, this is called the 'reflexive
transitive closure' of the employee/supervisor relationship. The distance
column is not strictly required, but it makes it easier to populate the table.)
employee_closure | ||
supervisor_id | employee_id | distance |
1 | 1 | 0 |
1 | 2 | 1 |
1 | 3 | 2 |
1 | 4 | 1 |
1 | 5 | 3 |
1 | 6 | 2 |
2 | 2 | 0 |
2 | 3 | 1 |
2 | 5 | 2 |
2 | 6 | 1 |
3 | 3 | 0 |
3 | 5 | 1 |
4 | 4 | 0 |
5 | 5 | 0 |
6 | 6 | 0 |
In the catalog XML, the <Closure>
element maps
the level onto a <Table>
:
<Dimension name="Employees" foreignKey="employee_id">
<Hierarchy hasAll="true" allMemberName="All Employees" primaryKey="employee_id">
<Table name="employee"/>
<Level name="Employee Id" uniqueMembers="true" type="Numeric"
column="employee_id" nameColumn="full_name" parentColumn="supervisor_id" nullParentValue="0">
<Closure parentColumn="supervisor_id" childColumn="employee_id">
<Table name="employee_closure"/>
</Closure>
<Property name="Marital Status" column="marital_status"/>
<Property name="Position Title" column="position_title"/>
<Property name="Gender" column="gender"/>
<Property name="Salary" column="salary"/>
<Property name="Education Level" column="education_level"/>
<Property name="Management Role" column="management_role"/>
</Level>
</Hierarchy>
</Dimension>
This table allows totals to be evaluated in pure SQL. Even though this introduces an extra
table into the query, database optimizers are very good at handling joins. I recommend that
you declare both supervisor_id
and employee_id
NOT NULL, and index
them as follows:
CREATE UNIQUE INDEX employee_closure_pk ON employee_closure (
supervisor_id,
employee_id);
CREATE INDEX employee_closure_emp ON employee_closure (
employee_id);
The table needs to be re-populated whenever the hierarchy changes, and it is the application's responsibility to do so — Mondrian does not do this!
If you are using Pentaho Data Integration (Kettle), there is a special step to populate closure tables as part of the ETL process. Further details in the Pentaho Data Integration wiki.
![]() Closure Generator step in Pentaho Data Integration |
If you are not using Pentaho Data Integration, you can populate the table yourself using SQL. Here is an example of a MySQL stored procedure that populates a closure table.
DELIMITER //
CREATE PROCEDURE populate_employee_closure()
BEGIN
DECLARE distance int;
TRUNCATE TABLE employee_closure;
SET distance = 0;
-- seed closure with self-pairs (distance 0)
INSERT INTO employee_closure (supervisor_id, employee_id, distance)
SELECT employee_id, employee_id, distance
FROM employee;
-- for each pair (root, leaf) in the closure,
-- add (root, leaf->child) from the base table
REPEAT
SET distance = distance + 1;
INSERT INTO employee_closure (supervisor_id, employee_id, distance)
SELECT employee_closure.supervisor_id, employee.employee_id, distance
FROM employee_closure, employee
WHERE employee_closure.employee_id = employee.supervisor_id
AND employee_closure.distance = distance - 1;
UNTIL (ROW_COUNT() == 0))
END REPEAT;
END //
DELIMITER ;
Suppose you want to create a measure whose value comes not from a column of the fact table,
but from an MDX formula. One way to do this is to use a WITH MEMBER
clause, like
this:
WITH MEMBER [Measures].[Profit] AS '[Measures].[Store
Sales]-[Measures].[Store Cost]',
FORMAT_STRING = '$#,###'
SELECT {[Measures].[Store Sales], [Measures].[Profit]} ON COLUMNS,
{[Product].Children} ON ROWS
FROM [Sales]
WHERE [Time].[1997]
But rather than including this clause in every MDX query of your application, you can define the member in your schema, as part of your cube definition:
<CalculatedMember name="Profit" dimension="Measures">
<CalculatedMemberProperty name="FORMAT_STRING" value="$#,##0.00"/>
You can also declare the formula as an XML attribute, if you prefer. The effect is just the same.
<CalculatedMember name="Profit" dimension="Measures" formula="[Measures].[Store Sales]-[Measures].[Store Cost]">
<CalculatedMemberProperty name="FORMAT_STRING" value="$#,##0.00"/>
Note that the <CalculatedMemberProperty>
(not <Property>
) element corresponds
to the FORMAT_STRING = '$#,###'
fragment of the MDX statement. You can define
other properties here too, but FORMAT_STRING
is by far the most useful in practice.
The FORMAT_STRING
property value can also be evaluated using an expression.
When formatting a particular cell, first the expression is evaluated to yield a format string,
then the format string is applied to the cell value. Here is the same property with a conditional
format string:
<CalculatedMemberProperty name="FORMAT_STRING" expression="Iif(Value < 0, '|($#,##0.00)|style=red', '|$#,##0.00|style=green')"/>
For more details about format strings, see the MDX specification.
One additional calculated member property that is worth mentioning is DATATYPE.
As with measures,
setting datatype specifies how the calculated member is returned via XML for Analysis.
The DATATYPE property of a calculated member can have values "String
", "Integer
", or
"Numeric
":
<CalculatedMemberProperty name="DATATYPE" value="Numeric"/>
You can specify SOLVE_ORDER for the calculated member property. Solve order determines the priority of calculation in the event of competing expressions
<CalculatedMemberProperty name="SOLVE_ORDER" value="2000"/>
You can make a calculated member or a measure invisible. If you specify visible="false"
(the default is "true") in the <Measure>
or
<CalculatedMember>
element, user-interfaces such as
JPivot will notice this property and hide the member. This is useful if you want to perform
calculations in a number of steps, and hide intermediate steps from end-users. For example,
here only "Margin per Sqft" is visible, and its factors "Store Cost", "Margin" and "Store Sqft"
are hidden:
<Measure name="Store Cost" column="store_cost" aggregator="sum" formatString="#,###.00" visible="false"/>
<CalculatedMember name="Margin" dimension="Measures" visible="false">
<CalculatedMember name="Store Sqft" dimension="Measures" visible="false">
<CalculatedMember name="Margin per Sqft" dimension="Measures" visible="true">
<CalculatedMemberProperty name="FORMAT_STRING" value="$#,##0.00"/>
The WITH SET
clause of an MDX statement allows you to declare a set expression
which can be used throughout that query. For example,
WITH SET [Top Sellers] AS
'TopCount([Warehouse].[Warehouse Name].MEMBERS, 5,
[Measures].[Warehouse Sales])'
SELECT
{[Measures].[Warehouse Sales]} ON COLUMNS,
{[Top Sellers]} ON ROWS
FROM [Warehouse]
WHERE [Time].[Year].[1997]
The WITH SET
clause is very similar to the WITH MEMBER
clause,
and as you might expect, it has a construct in schema analogous to
<CalculatedMember>
. The
<NamedSet>
element allows you to define a
named set in your schema as part of a cube definition. It is implicitly available for
any query against that cube:
<Cube name="Warehouse">
...
<NamedSet name="Top Sellers">
</NamedSet>
</Cube>
SELECT
{[Measures].[Warehouse Sales]} ON COLUMNS,
{[Top Sellers]} ON ROWS
FROM [Warehouse]
WHERE [Time].[Year].[1997]
Warehouse | Warehouse Sales |
Treehouse Distribution | 31,116.37 |
Jorge Garcia, Inc. | 30,743.77 |
Artesia Warehousing, Inc. | 29,207.96 |
Jorgensen Service Storage | 22,869.79 |
Destination, Inc. | 22,187.42 |
A named set defined against a cube is not inherited by a virtual cube defined against that cube. (But you can define a named set against a virtual cube.)
You can also define a named set as global to a schema:
<Schema>
<Cube name="Sales" ... />
<Cube name="Warehouse" ... />
<VirtualCube name="Warehouse and Sales" .../>
<NamedSet name="CA Cities" formula="{[Store].[USA].[CA].Children}"/>
<NamedSet name="Top CA Cities">
</NamedSet>
</Schema>
A named set defined against a schema is available in all cubes and virtual cubes in that
schema. However, it is only valid if the cube contains dimensions with the names required to
make the formula valid. For example, it would be valid to use [CA Cities]
in
queries against the [Sales]
and [Warehouse and Sales]
cubes, but
if you used it in a query against the [Warehouse]
cube you would get an error,
because [Warehouse]
does not have a [Store]
dimension.
Sometimes Mondrian's schema language isn't flexible enough, or the MDX language isn't powerful enough, to solve the problem at hand. What you want to do is add a little of your own Java code into the Mondrian application, and a plug-in is a way to do this.
Each of Mondrian's extensions is technically a Service Provider Interface (SPI); in short, a Java interface which you write code to implement, and which Mondrian will call at runtime. You also need to register an extension (usually somewhere in your schema.xml file) and to ensure that it appears on the classpath.
Plug-ins include user-defined functions; cell, member and property formatters; dynamic schema processors and data source change listeners. There is incomplete support for member readers and cell readers, and in future we may support pluggable SQL dialects.
Some plug-ins (user-defined function, member formatter, property formatter, cell formatter) can be implemented in a scripting language such as JavaScript. In this case, you do not need to write a Java class; you just enclose the script code within a Script element in the mondrian schema file. Extensions implemented in scripting languages do not in general perform as well as extensions implemented in Java, but they are much more convenient because you do not need to compile any code. Just modify the script code in the mondrian schema file and re-load the schema. The shorter code-debug-fix cycle allows you to develop your application much faster. Once you have implemented the plug-in in script, if performance is still a concern, you can translate your plug-in into Java.
Other extensions include Dynamic datasource XMLA servlet.
A user-defined function must have a public constructor and implement the mondrian.spi.UserDefinedFunction interface. For example,
package com.example;
import mondrian.olap.*;
import mondrian.olap.type.*;
import mondrian.spi.UserDefinedFunction;
/**
* A simple user-defined function which adds one to its argument.
*/
public class PlusOneUdf implements UserDefinedFunction {
// public constructor
public PlusOneUdf() {
}
public String getName() {
return "PlusOne";
}
public String getDescription() {
return "Returns its argument
plus one";
}
public Syntax getSyntax() {
return Syntax.Function;
}
public Type getReturnType(Type[] parameterTypes) {
return new NumericType();
}
public Type[] getParameterTypes() {
return new Type[] {new
NumericType()};
}
public Object execute(Evaluator evaluator, Exp[]
arguments) {
final Object argValue =
arguments[0].evaluateScalar(evaluator);
if (argValue instanceof
Number) {
return new Double(((Number) argValue).doubleValue() + 1);
} else {
//
Argument might be a RuntimeException indicating that
//
the cache does not yet have the required cell value. The
//
function will be called again when the cache is loaded.
return null;
}
}
public String[] getReservedWords() {
return null;
}
}
Declare it in your schema:
<Schema ...>
...
<UserDefinedFunction name="PlusOne" className="com.example.PlusOneUdf"/>
</Schema>
And use it in any MDX statement:
WITH MEMBER [Measures].[Unit Sales Plus One]
AS 'PlusOne([Measures].[Unit
Sales])'
SELECT
{[Measures].[Unit Sales]} ON COLUMNS,
{[Gender].MEMBERS} ON ROWS
FROM [Sales]
If a user-defined function has a public constructor with one string argument, Mondrian will pass in the function's name. Why? This allows you to define two or more user-defined functions using the same class:
package com.example;
import mondrian.olap.*;
import mondrian.olap.type.*;
import mondrian.spi.UserDefinedFunction;
/**
* A user-defined function which either adds one to or
* subtracts one from its argument.
*/
public class PlusOrMinusOneUdf implements UserDefinedFunction {
private final name;
private final isPlus;
// public constructor with one argument
public PlusOneUdf(String
name) {
this.name = name;
if (name.equals("PlusOne")) {
isPlus = true;
} else if
(name.equals("MinusOne")) {
isPlus = false;
} else {
throw new IllegalArgumentException("Unexpected name " + name);
}
}
public String getName() {
return name;
}
public String getDescription() {
return "Returns its argument
plus or minus one";
}
public Syntax getSyntax() {
return Syntax.Function;
}
public Type getReturnType(Type[] parameterTypes) {
return new NumericType();
}
public Type[] getParameterTypes() {
return new Type[] {new
NumericType()};
}
public Object execute(Evaluator evaluator, Exp[]
arguments) {
final Object argValue =
arguments[0].evaluateScalar(evaluator);
if (argValue instanceof
Number) {
if (isPlus) {
return new Double(((Number) argValue).doubleValue() + 1);
}
else {
return new Double(((Number) argValue).doubleValue() - 1);
}
} else {
//
Argument might be a RuntimeException indicating that
//
the cache does not yet have the required cell value. The
//
function will be called again when the cache is loaded.
return null;
}
}
public String[] getReservedWords() {
return null;
}
}
and register two the functions in your schema:
<Schema ...>
...
<UserDefinedFunction name="PlusOne" className="com.example.PlusOrMinusOneUdf">
<UserDefinedFunction name="MinusOne" className="com.example.PlusOrMinusOneUdf">
</Schema>
If you're tired of writing duplicate user-defined function
declarations in schema files, you can pack your user-defined function
implemention classes into a jar file with a embedded
META-INF/services/mondrian.spi.UserDefinedFunction
resource file. This resource file contains class names of
implementations of interface mondrian.spi.UserDefinedFunction, one
name per line. For more details, you may look into
src/main/META-INF/services/mondrian.spi.UserDefinedFunction
in the source distribution and the
Service Provider section of the specification of JAR files.
User-defined functions declared by this means are available to all
mondrian schemas in the JVM.
Caution: you can't define more than one user-defined function
implementations in one class when you declare user-defined functions
in this way. One function will be loaded for each class, and given the
name that the getName()
method returns.
User-defined functions can also be implemented in a scripting language, such as JavaScript. These functions may not perform quite as well as Java UDFs or built-in functions, but they are a lot more convenient to implement.
To define a UDF in script, use the Script element and include within it the followimg functions:
getName()
getDescription()
getSyntax()
getParameterTypes()
getReturnType(parameterTypes)
execute(evaluator, arguments)
The getName(), getDescription(), getReservedWords() and getSyntax() methods are optional; getName() defaults to the name attribute in the UserDefinedFunction element, getDescription() defaults to the name, getReservedWords() returns the empty list, and getSyntax() defaults to mondrian.olap.Syntax.Function. The other methods have similar meanings to the ones in the UserDefinedFunction SPI.
Here is an example of the factorial function as a JavaScript UDF:
<UserDefinedFunction name="Factorial">
<Script language="JavaScript">
function getParameterTypes() {
return new Array(new mondrian.olap.type.NumericType());
}
function getReturnType(parameterTypes) {
return new mondrian.olap.type.NumericType();
}
function execute(evaluator, arguments) {
var n = arguments[0].evaluateScalar(evaluator);
return factorial(n);
}
function factorial(n) {
return n <= 1 ? 1 : n * factorial(n - 1);
}
</Script>
A cell formatter modifies the behavior of
Cell.getFormattedValue()
. The class must implement the
mondrian.spi.CellFormatter
interface, and is specified like this:
<Measure name="name">
<CellFormatter className="com.example.MyCellFormatter"/>
</Measure>
(The previous syntax, using the 'formatter' attribute of the Measure element, is deprecated, and will be removed in mondrian-4.0.)
You can specify a formatter in a scripting language such as JavaScript, using the Script element:
<Measure name="name">
<Script language="JavaScript">
</Script>
</Measure>
The script has available a value
variable,
corresponding to the parameter of
the
mondrian.spi.CellFormatter.formatCell(Object value)
method. The code fragment can have multiple statements, but must end
in a return
statement.
For a calculated member that belongs to a cube or virtual cube, you can also use the CellFormatter element:
<CalculatedMember name="name" dimension="dimension">
<Formula>
[Measures].[Unit Sales] * 2
</Formula>
<Script language="JavaScript">
var s = value.toString();
while (s.length() < 20) {
s = "0" + s;
}
return s;
</Script>
</Measure>
You can also define a formatter by setting the CELL_FORMATTER
property of the member to the name of the formatter class.
<CalculatedMember name="name" formatter="com.example.MyCellFormatter">
<CalculatedMemberProperty name="CELL_FORMATTER" value="com.example.MyCellFormatter"/>
For a calculated measure defined in the WITH MEMBER
clause of an
MDX query, you can set the same property in the MDX to achieve the same effect:
WITH MEMBER [Measures].[Foo]
AS '[Measures].[Unit Sales] * 2',
CELL_FORMATTER='com.example.MyCellFormatter'
SELECT {[Measures].[Unit Sales], [Measures].[Foo]} ON COLUMNS,
{[Store].Children} ON ROWS
FROM [Sales]
To define a scripted formatter, use
the CELL_FORMATTER_SCRIPT
and CELL_FORMATTER_SCRIPT_LANGUAGE
properties:
WITH MEMBER [Measures].[Foo]
AS '[Measures].[Unit Sales] * 2',
CELL_FORMATTER_SCRIPT_LANGUAGE='JavaScript',
CELL_FORMATTER_SCRIPT='var s = value.toString(); while (s.length() < 20) s = "0" + s; return s;'
SELECT {[Measures].[Unit Sales], [Measures].[Foo]} ON COLUMNS,
{[Store].Children} ON ROWS
FROM [Sales]
The cell formatter property is ignored if a member does not belong to the
[Measures]
dimension.
A member formatter modifies the behavior of
Member.getCaption()
. The class must implement
the
mondrian.spi.MemberFormatter
interface, and is specified
like this:
<Level name="name" column="column">
<MemberFormatter className="com.example.MyMemberFormatter"/>
</Level>
(The previous syntax, using the 'formatter' attribute of the Level element, is deprecated, and will be removed in mondrian-4.0.)
You can specify a formatter in a scripting language such as JavaScript, using the Script element:
<Level name="name" column="column">
<Script language="JavaScript">
return member.getName().toUpperCase();
</Script>
</Level>
The script has available a member
variable,
corresponding to the parameter of
the
mondrian.spi.MemberFormatter.formatMember(Member member)
method. The code fragment can have multiple statements, but must end
in a return
statement.
A property formatter modifies the behavior of
Property.getPropertyFormattedValue()
. The class must implement the
mondrian.spi.PropertyFormatter
interface, and is specified like this:
<Attribute name="My Attribute" column="attributeColumn" uniqueMembers="true">
<Property name="My Property" column="propColumn">
<PropertyFormatter className="com.example.MyPropertyFormatter"/>
</Property
<Attribute/>
You can specify a formatter in a scripting language such as JavaScript, using the Script element:
<Level name="name" column="column">
<Property name="MyProp" column="PropColumn">
<Script language="JavaScript">
return member.getName().toUpperCase();
</Script>
</Property>
</Level>
The script has
available member
, propertyName
and propertyValue
variables, corresponding to the
parameters of
the mondrian.spi.PropertyFormatter.formatProperty(Member
member, String propertyName, Object propertyValue) method.
The code fragment can have multiple statements, but must end in
a return
statement.
A dynamic schema processor implements the
mondrian.spi.DynamicSchemaProcessor
interface. It is specified as part of
the connection string, like this:
Jdbc=jdbc:odbc:MondrianFoodMart; JdbcUser=ziggy;
JdbcPassword=stardust; DynamicSchemaProcessor=com.example.MySchemaProcessor
The effect is that when reading the contents of the schema from a URL, Mondrian turns to the schema processor rather than Java's default URL handler. This gives the schema reader the opportunity to run a schema through a filter, or even generate an entire schema on the fly.
When DynamicSchemaProcessor
is specified, schema would
be processed and reloaded on every ROLAP connection request. Property
UseContentChecksum
should be used along with a schema processor
to enable caching of the schema:
DataSource=java:/jdbc/MyWarehouse;
DynamicSchemaProcessor=com.example.MySchemaProcessor;
UseContentChecksum=true
In this case once loaded schema would be cached until it changes. If schema content changes, it is be reloaded. (The newly loaded schema is regarded as a different schema, and will start with empty caches.)
Dynamic schemas are a very powerful construct. As we shall see, an important application for them is internationalization.
The DynamicDatasourceXmlaServlet
extends DefaultXmlaServlet
, adding the ability to dynamically load
the datasources.xml
file. For every client request that it receives, it checks
for updates to the content of datasources.xml
. It selectively
clears cache for catalogs that have changed or no longer exist in datasources.xml
.
The servlet considers a
catalog as changed when either of its properties (DataSourceInfo, definition
properties on
DataSourcesConfig.Catalog) are different. It identifies catalog by name.
This servlet complements the dynamic catalog loading capability based on UseContentChecksum. It does not check the catalog content for updates. There is no overlap in the functionality. Both together will give full dynamic datasource and catalog configuration capability.
To use DynamicDatasourceXmlaServlet, change definition of the MondrianXmlaServlet
servlet in web.xml
:
<servlet>
<servlet-name>MondrianXmlaServlet</servlet-name>
<servlet-class>mondrian.xmla.impl.DynamicDatasourceXmlaServlet</servlet-class>
...
</servlet>
This implementation has a limitation. It requires catalog name to be unique across all datasources and may not work correctly otherwise.
An internationalized Mondrian application would have a schema for each language, where the
caption of each object appears in the local language. For example, the [Product]
dimension would have the caption "Product" in English and "Produit" in French.
It is unwise to translate the actual names of the schema objects, because then the MDX statements would need to be changed also. All that you need to change is the caption. Every schema object (schema, cube, virtual cube, dimension, hierarchy, level, measure, named set) has a caption attribute, and user interfaces such as JPivot and Pentaho Analyzer display the caption rather than the real name. Additionally:
allMemberCaption
attribute
as display value of the "All" member. measuresCaption
attribute. Measures
dimension).One way to create an internationalized application is to create a copy of the schema file for each language, but these are difficult to maintain. A better way is to use the LocalizingDynamicSchemaProcessor class to perform dynamic substitution on a single schema file.
First, write your schema using variables as values for caption
,
description
,
allMemberCaption
and measuresCaption
attributes as follows:
<Schema ... measuresCaption="%{foodmart.measures.caption}">
<Dimension name="Store" caption="%{foodmart.dimension.store.caption}"
description="%{foodmart.dimension.store.description}">
<Hierarchy hasAll="true" allMemberName="All Stores" allMemberCaption="%{foodmart.dimension.store.allmember.caption =All Stores}" primaryKey="store_id"
caption="%{foodmart.hierarchy.store.country.caption}"
description="%{foodmart.hierararchy.store.country.description}>
<Table name="store"/>
<Level name="Store Country" column="store_country" uniqueMembers="true" caption="%{foodmart.dimension.store.country.caption}"
description="%{foodmart.dimension.store.country.description}"/>
<Level name="Store State" column="store_state" uniqueMembers="true" caption="%{foodmart.dimension.store.state.caption}"
description="%{foodmart.dimension.store.state.description}"/>
<Level name="Store City" column="store_city" uniqueMembers="false" caption="%{foodmart.dimension.store.city.caption}"
description="%{foodmart.dimension.store.city.description}"/>
<Level name="Store Name" column="store_name" uniqueMembers="true" caption="%{foodmart.dimension.store.name.caption}"
description="%{foodmart.dimension.store.name.description}">
<Property name="Store Type" column="store_type" caption="%{foodmart.dimension.store. name.property_type.caption}"
description="%{foodmart.dimension.store. name.property_type.description}"/>
<Property name="Store Manager" column="store_manager" caption="%{foodmart.dimension.store. name.property_manager.caption}"
description="%{foodmart.dimension.store. name.property_manager.description}"/>
<Property name="Store Sqft" column="store_sqft" type="Numeric" caption="%{foodmart.dimension.store. name.property_storesqft.caption}"
description="%{foodmart.dimension.store.
name.property_storesqft.description}"/>
<Property name="Grocery Sqft" column="grocery_sqft" type="Numeric"/>
<Property name="Frozen Sqft" column="frozen_sqft" type="Numeric"/>
<Property name="Meat Sqft" column="meat_sqft" type="Numeric"/>
<Property name="Has coffee bar" column="coffee_bar" type="Boolean"/>
<Property name="Street address" column="store_street_address" type="String"/>
</Level>
</Hierarchy>
</Dimension>
<Cube name="Sales" caption="%{foodmart.cube.sales.caption}"
description="%{foodmart.cube.sales.description}">
...
<DimensionUsage name="Store" source="Store" foreignKey="store_id"
caption="%{foodmart.cube.sales.name.caption}"
description="%{foodmart.cube.sales.name.description}"/>
...
<Measure name="Unit Sales" column="unit_sales" caption="%{foodmart.cube.sales.measure.unitsales.caption}"
description="%{foodmart.cube.sales.measure.unitsales.description}"/>
</Cube>
</Schema>
As usual, the default caption for any cube, measure, dimension or
level without a caption
attribute is the name of the
element. A hierarchy's default caption is the caption of its dimension;
for example, the [Store]
hierarchy has no caption
defined, so it inherits the caption
attribute from its
parent, the
[Store]
dimension.
Next, add the dynamic schema processor and locale to your connect string. For example,
Provider=mondrian; Locale=en_US;
DynamicSchemaProcessor=mondrian.i18n.LocalizingDynamicSchemaProcessor;
Jdbc=jdbc:mysql://localhost/foodmart;
JdbcUser=foodmart;
JdbcPassword=foodmart;
Catalog=/WEB-INF/FoodMart.mondrian.xml
Now, for each locale you wish to support, provide a resource file
named
locale_{locale}.properties
. For example,
# locale.properties: Default resources
foodmart.measures.caption=Measures
foodmart.dimension.store.country.caption=Store Country
foodmart.dimension.store.name.property_type.column= store_type
foodmart.dimension.store.country.member.caption= store_country
foodmart.dimension.store.name.property_type.caption =Store Type
foodmart.dimension.store.name.caption =Store Name
foodmart.dimension.store.state.caption =Store State
foodmart.dimension.store.name.property_manager.caption =Store Manager
foodmart.dimension.store.name.property_storesqft.caption =Store Sq. Ft.
foodmart.dimension.store.allmember.caption =All Stores
foodmart.dimension.store.caption =Store
foodmart.cube.sales.caption =Sales
foodmart.dimension.store.city.caption =Store City
foodmart.cube.sales.measure.unitsales =Unit Sales
and
# locale_hu.properties: Resources for the 'hu' locale.
foodmart.measures.caption=Hungarian Measures
foodmart.dimension.store.country.caption=Orsz\u00E1g
foodmart.dimension.store.name.property_manager.caption =\u00C1ruh\u00E1z
vezet\u0151
foodmart.dimension.store.country.member.caption =store_country_caption_hu
foodmart.dimension.store.name.property_type.caption =Tipusa
foodmart.dimension.store.name.caption =Megnevez\u00E9s
foodmart.dimension.store.state.caption =\u00C1llam/Megye
foodmart.dimension.store.name.property_type.column =store_type_caption_hu
foodmart.dimension.store.name.property_storesqft.caption =M\u00E9ret
n.l\u00E1b
foodmart.dimension.store.allmember.caption =Minden \u00C1ruh\u00E1z
foodmart.dimension.store.caption =\u00C1ruh\u00E1z
foodmart.cube.sales.caption =Forgalom
foodmart.dimension.store.city.caption =V\u00E1ros
foodmart.cube.sales.measure.unitsales =Eladott db
Aggregate tables are a way to improve Mondrian's performance when the fact table contains a huge number of rows: a million or more. An aggregate table is essentially a pre-computed summary of the data in the fact table.
Let's look at a simple aggregate table.
<Cube name="Sales">
<Table name="sales_fact_1997">
<AggName name="agg_c_special_sales_fact_1997">
<AggFactCount column="FACT_COUNT"/>
<AggMeasure name="[Measures].[Store Cost]" column="STORE_COST_SUM"/>
<AggMeasure name="[Measures].[Store Sales]" column="STORE_SALES_SUM"/>
<AggLevel name="[Product].[Product Family]" column="PRODUCT_FAMILY"/>
<AggLevel name="[Time].[Quarter]" column="TIME_QUARTER"/>
<AggLevel name="[Time].[Year]" column="TIME_YEAR"/>
<AggLevel name="[Time].[Quarter]" column="TIME_QUARTER"/>
<AggLevel name="[Time].[Month]" column="TIME_MONTH"/>
</AggName>
</Table>
<!-- Rest of the cube definition -->
</Cube>
The <AggForeignKey>
element, not shown here, allows you to reference a dimension table
directly, without including its columns in the aggregate table. It is
described in the
aggregate
tables guide.
In practice, a cube which is based upon a very large fact table may have several aggregate tables. It is inconvenient to declare each aggregate table explicitly in the schema XML file, and luckily there is a better way. In the following example, Mondrian locates aggregate tables by pattern-matching.
<Cube name="Sales">
<Table name="sales_fact_1997">
<AggPattern pattern="agg_.*_sales_fact_1997">
<AggFactCount column="FACT_COUNT"/>
<AggMeasure name="[Measures].[Store Cost]" column="STORE_COST_SUM"/>
<AggMeasure name="[Measures].[Store Sales]" column="STORE_SALES_SUM"/>
<AggLevel name="[Product].[Product Family]" column="PRODUCT_FAMILY"/>
<AggLevel name="[Time].[Quarter]" column="TIME_QUARTER"/>
<AggLevel name="[Time].[Year]" column="TIME_YEAR"/>
<AggLevel name="[Time].[Quarter]" column="TIME_QUARTER"/>
<AggLevel name="[Time].[Month]" column="TIME_MONTH"/>
<AggExclude name="agg_c_14_sales_fact_1997"/>
<AggExclude name="agg_lc_100_sales_fact_1997"/>
</AggPattern>
</Table>
</Cube>
It tells Mondrian to treat all tables which match the pattern "agg_.*_sales_fact_1997"
as aggregate tables, except "agg_c_14_sales_fact_1997"
and
"agg_lc_100_sales_fact_1997"
. Mondrian uses rules to deduce the
roles of the columns in those tables, so it's important to adhere to
strict naming conventions. The naming conventions are described in the
aggregate tables
guide.
The performance guide has advice on choosing aggregate tables.
OK, so now you've got all this great data, but you don't everyone to be able to read all of it. To solve this, you can define an access-control profile, called a Role, as part of the schema, and set this role when establishing a connection.
Roles are defined by <Role>
elements,
which occur as direct children of the <Schema>
element, after the last <Cube>
.
Here is an example of a role:
<Role name="California manager">
<SchemaGrant access="none">
<CubeGrant cube="Sales" access="all">
<DimensionGrant hierarchy="[Measures]" access="all"/>
<HierarchyGrant hierarchy="[Store]" access="custom" topLevel="[Store].[Store Country]">
<MemberGrant member="[Store].[USA].[CA]" access="all"/>
<MemberGrant member="[Store].[USA].[CA].[Los Angeles]" access="none"/>
<HierarchyGrant hierarchy="[Customers]" access="custom" topLevel="[Customers].[State Province]" bottomLevel="[Customers].[City]">
<MemberGrant member="[Customers].[USA].[CA]" access="all"/>
<MemberGrant member="[Customers].[USA].[CA].[Los Angeles]" access="none"/>
<HierarchyGrant hierarchy="[Gender]" access="none"/>
</CubeGrant>
</SchemaGrant>
</Role>
A <SchemaGrant>
defines the default access for
objects in a schema. The access
attribute can be "all" or
"none";
this access can be overridden for specific objects. In this case,
because
access="none"
, a user would only be able to browse the
"Sales" cube,
because it is explicitly granted.
A <CubeGrant>
defines the access to
a particular cube. As for <SchemaGrant>
,
the access attribute can be "all", "custom" or "none", and can
be overridden for specific sub-objects in the cube.
A <DimensionGrant>
defines access to a dimension. The access attribute can be "all", "custrom"
or "none". An access level of "all" means that all the child hierarchies
of the dimension will get inherited access. An access level of "custom" means
that the role does not get an inherent access to the child hierarchies,
unless the role is explicitely granted using a
<HierarchyGrant> element.
A <HierarchyGrant>
defines
access to a hierarchy. The access attribute can be "all", meaning all
members
are visible; "none", meaning the hierarchy's very existence is hidden
from the
user; and "custom". With custom access, you can use the topLevel
attribute to define the top level which is visible (preventing users
from seeing
too much of the 'big picture', such as viewing revenues rolled up to
the
Store Country
level); or use the bottomLevel
attribute to
define the bottom level which is visible (here, preventing users from
invading
looking at individual customers' details); or control which sets of
members the
user can see, by defining nested <MemberGrant>
elements.
You can only define a <MemberGrant>
element if its enclosing <HierarchyGrant>
has access="custom"
. Member grants give (or remove)
access to a
given member, and all of its children. Here are the rules:
topLevel="[Store].[Store State]"
,
and grant access to California, you won't be able to see USA.
Member grants do not override the topLevel and bottomLevel attributes.
You can grant or deny access to a member of any level, but the top
and bottom constraints have precedence on the explicit member grants.In the example, the user will have access to California, and all of
the
cities in California except Los Angeles. They will be able to see USA
(because
its child, California, is visible), but no other nations, and not All
Stores
(because it is above the top level, Store Country
).
A rollup policy determines how mondrian computes a member's total
if the current role cannot see all of that member's children. Under the default
rollup policy, called 'full', the total for that member includes contributions
from the children that are not visible. For example, suppose that Fred belongs to a role
that can see [USA].[CA]
and [USA].[OR]
but not [USA].[WA]
.
If Fred runs the query
SELECT {[Measures].[Unit Sales]} ON COLUMNS,
{[[Store].[USA], Store].[USA].Children} ON ROWS
FROM [Sales]
the query returns
[Customer]
[Measures].[Unit Sales]
[USA]
266,773 [USA].[CA]
74,748 [USA].[OR]
67,659
Note that [USA].[WA]
is not returned, per the access-control
policy, but the total includes the total from Washington (124,366) that
Fred cannot see. For some applications, this is not appropriate.
In particular, if the dimension has a small number of members,
the end-user may be able to deduce the values of the members
which they do not have access to.
To remedy this, a role can apply a different rollup policy to a hierarchy. The policy describes how a total is calculated for a particular member if the current role can only see some of that member's children:
rollupPolicy
attribute.Under the 'partial' policy, the [USA]
total is the sum of the
accessible children [CA]
and [OR]
:
[Customer]
[Measures].[Unit Sales]
[USA]
142,407 [USA].[CA]
74,748 [USA].[OR]
67,659
Under 'hidden' policy, the [USA] total is hidden because one of its children is not accessible:
[Customer]
[Measures].[Unit Sales]
[USA]
- [USA].[CA]
74,748 [USA].[OR]
67,659
The policy is specified per role and hierarchy. In the following example, the role sees partial totals for the
[Store]
hierarchy but full totals for [Product]
.
<Role name="South Pacific manager">
<SchemaGrant access="none">
<CubeGrant cube="Sales" access="all">
<HierarchyGrant hierarchy="[Store]" access="custom" rollupPolicy="partial" topLevel="[Store].[Store Country]">
<MemberGrant member="[Store].[USA].[CA]" access="all"/>
<MemberGrant member="[Store].[USA].[CA].[Los Angeles]" access="none"/>
<HierarchyGrant hierarchy="[Customers]" access="custom" rollupPolicy="full" topLevel="[Customers].[State Province]" bottomLevel="[Customers].[City]">
<MemberGrant member="[Customers].[USA].[CA]" access="all"/>
<MemberGrant member="[Customers].[USA].[CA].[Los Angeles]" access="none"/>
<HierarchyGrant hierarchy="[Gender]" access="none"/>
</CubeGrant>
</SchemaGrant>
</Role>
This example also shows existing features, such as how hierarchy grants can be restricted using
topLevel
and/or bottomLevel
attributes, and how a role
can be prevented from seeing a hierarchy using access="none".
A union role combines several roles, and has the sum of their privileges.
A union role can see a particular schema object if one or more of its constituent roles can see it. Similarly, the rollup policy of a union role with respect to a particular hierarchy is the least restrictive of all of the roles' rollup policies.
Here is an example showing the syntax of a union role.
<Role name="Coastal manager">
<Union>
<RoleUsage roleName="California manager"/>
<RoleUsage roleName="Eastern sales manager"/>
</Union>
</Role>
The constituent roles "California manager" and "Eastern sales manager" may be
regular roles, user-defined roles or union roles, but they must be declared
earlier in the schema file. The "Coastal manager" role will be able to see any
member that or a "California manager" and "Eastern sales manager". It will be
able to see all the cells at the intersection of these members, plus it will be
able to see cells that neither role can see: for example, if only "California
manager" can see [USA].[CA].[Fresno]
, and only "Eastern sales
manager" see the [Sales Target]
measure, then "Coastal manager"
will be able to see the sales target for Fresno, which neither of the
constituent roles have access to.
A role only has effect when it is associated with a connection. By default, connections have a role which gives them access to every cube in that connection's schema.
Most databases associate roles (or 'groups') with users, and automatically assign them when users log in. However, Mondrian doesn't have the notion of users, so you have to establish the role in a different way. There are two ways of doing this:
Role
keyword in the connect string, the connection will adopt that role. You can
specify multiple role names separated by commas, and a union role will be
created; if a role name contains a comma, escape it with an extra comma. See
class DriverManager
for examples of connect string syntax.Element | Description |
Schema |
|
<Schema> |
The top-level collection of elements relating to a particular enterprise. Cubes belong to a Schema, as do shared Dimensions, Named Sets, Roles, User-defined functions, and Parameters. (Dimensions and Named Sets may also be defined privately within a particular Cube.) |
Physical elements |
|
<PhysicalSchema> |
Collection of table usages, linked via relationships, upon which the logical schema is constructed. |
<Table> |
Defines a use of a table. |
<Query> |
Defines a 'table' using a SQL query, which can have different variants for different underlying databases. |
<InlineTable> |
Defines a table using an inline dataset. |
<Link> |
Link between two tables. |
<ColumnDefs> |
Holder for <ColumnDef> elements. |
<ColumnDef> |
Definition of a column in an <InlineTable> dataset. |
<ExpressionView> |
Collection of SQL expressions for a <CalculatedColumnDef>
or <Query> , one for each supported dialect. |
<Column> |
Reference to a column in a SQL expression (inside a
<SQL> element) or in a key structure (<Key> ,
<ForeignKey> , <Name> , <Caption> ,
<OrderBy> ). |
<SQL> |
The SQL expression for a particular database dialect. |
<Rows> |
Holder for <Row> elements. |
<Row> |
Row in an <InlineTable> dataset. |
<Value> |
Value of a column in an <InlineTable> dataset. |
Logical elements |
|
<Cube> |
The main business view for an analysis, a cube is a collection of dimensions and measures. |
<Dimensions> |
Collection of <Dimension> elements. |
<Dimension> |
A collection of <Attribute> s that may be used to subdivide a
multidimensional data set, some of them organized into
<Hierarchy> structures. |
<Attributes> |
Collection of <Attribute> elements. |
<Attribute> |
Data item in a <Dimension> . |
<Hierarchies> |
Collection of <Hierarchy> elements. |
<Hierarchy> |
Collection of <Attributes> organized into <Levels> . |
<Levels> |
Collection of <Level> elements. |
<Level> |
Level of a <Hierarchy> . |
<Key> |
Column or columns that are the key of an <Attribute> , in lieu of Attribute.keyColumn . |
<Name> |
Column that holds the name of an <Attribute> , in lieu of Attribute.nameColumn . |
<Caption> |
Column that holds the caption of an <Attribute> , in lieu of Attribute.captionColumn . |
<OrderBy> |
Column or columns that used to sort members of an <Attribute> , in lieu of Attribute.orderByColumn . |
<Property> |
Member property. The definition is against a hierarchy or level, but the property will be available to all members. |
<MeasureGroups> |
Collection of <MeasureGroup> elements. |
<MeasureGroup> |
Collection of <Measure> s that are stored in the same fact table. |
<Measures> |
Collection of <Measure> elements. |
<Measure> |
Measure. |
<DimensionLinks> |
Collection of links between a <MeasureGroup> and the <Dimension> s of the <Cube> . |
<ForeignKeyLink> |
Links a <Dimension> to a <MeasureGroup> using a foreign key
from the fact table to the dimension table. |
<FactLink> |
Declares that a Dimension is trivially linked to a MeasureGroup because its dimension table is the same as the fact table. |
<CopyLink> |
Indicates dimension that is represented by key columns in the MeasureGroup's table. |
<ReferenceLink> |
Links a Dimension to a MeasureGroup via another Dimension. |
<NoLink> |
Documents that a <Dimension> is not linked to the current <MeasureGroup> . |
<CalculatedMembers> |
Collection of <CalculatedMember> elements. |
<CalculatedMember> |
A member whose value is derived using a formula, defined as part of a cube. |
<NamedSets> |
Collection of <NamedSet> elements. |
<NamedSet> |
A set whose value is derived using a formula, defined as part of a cube. |
<Closure> |
Maps a parent-child hierarchy onto a closure table. |
Access control |
|
<Role> |
An access-control profile. |
<SchemaGrant> |
A set of rights to a schema. |
<CubeGrant> |
A set of rights to a cube. |
<HierarchyGrant> |
A set of rights to a <Hierarchy> and <Level> s within that Hierarchy. |
<MemberGrant> |
A set of rights to a <Member> and its children. |
<Union> |
Definition of a set of rights as the union of a set of <Role> s. |
<RoleUsage> |
A reference to a <Role> . |
Extensions |
|
<UserDefinedFunction> |
Declares a user-defined function. |
<CellFormatter> |
Cell formatter. |
<MemberFormatter> |
Member formatter. |
<PropertyFormatter> |
Property formatter. |
<Script> |
Script fragment to implement an SPI such as user-defined function, member formatter, or cell formatter. |
Miscellaneous |
|
<Annotations> |
Holder for annotations. |
<Annotation> |
User-defined property attached to a metadata element. |
<Parameter> |
Part of the definition of a Hierarchy; passed to a MemberReader, if present. |
<CalculatedMemberProperty> |
Property of a calculated member. |
<Formula> |
Holds the formula text within a <NamedSet> or <CalculatedMember> . |
<MeasureExpression> |
SQL expression used to compute a measure, in lieu of Measure.column . |
Author: Julian Hyde; last modified October 2012.
Copyright (C) 2001-2005 Julian Hyde.
Copyright (C) 2005-2013 Pentaho and others