XmlDomDataContext

The class XmlDomDataContext is no longer recommended, except for exploratory use. It will load the complete DOM of an XML file into memory and model tables based on each element that contains sub-elements or attributes with value-types in it. This makes it a good vehicle for visualizing or demonstrating the capability to read XML files, but unfortunately it also makes it very content-dependent and different XML files may not get the same schema model even if they share the same XSD.

XmlSaxDataContext

The XmlSaxDataContext is the newer and much improved DataContext implementation for XML files.

Since XML files are hierarical and MetaModel tables are tabular, you need to do some mapping. MetaModel provides a mapping model that is XPath based, with a few slight modifications.  

Assume we have the following XML document:

 <?xml version="1.0" encoding="UTF-8"?>
 <root>
         <organization type="governmental">
                 <name>Company A</name>
                 <employees>
                         <employee>
                                 <name>John Doe</name>
                                 <gender>M</gender>
                         </employee>
                         <employee>
                                 <name>Jane Doe</name>
                                 <gender>F</gender>
                         </employee>
                 </employees>
         </organization>
 
         <organization type="company">
                 <name>Company B</name>
                 <employees>
                         <employee>
                                 <name>Peter</name>
                                 <gender>M</gender>
                         </employee>
                         <employee>
                                 <name>Bob</name>
                                 <gender>M</gender>
                         </employee>
                 </employees>
         </organization>
 </root>

Now imagine that you want to have a table of employee names and gender information, and another table with company name and type information. We define our DataContext and those tables like this:  

XmlSaxTableDef employeeTableDef = new XmlSaxTableDef(
   "/root/organization/employees/employee",
   new String[] {
     "/root/organization/employees/employee/name",
     "/root/organization/employees/employee/gender"
   }
);
 
XmlSaxTableDef organizationTableDef = new XmlSaxTableDef(
   "/root/organization",
   new String[] {
     "/root/organization/name",
     "/root/organization@type"
   }
);
 
DataContext dc = new XmlSaxDataContext(new File("my_file.xml"), employeeTableDef, organizationTableDef);

As you see, we simply provide XPath expressions to 1) define the record scope and 2) define paths of individual values (or rather - the column definitions). If you query those tables, you will get datasets like these:  

Table: /employee  

 

row_id

/name

/gender

0

John Doe

M

1

Jane Doe

F

2

Peter

M

3

Bob

M

 

Table: /organization

 

row_id

/name

@type

0

Company A

governmental

1

Company B

company

This is nice, but you might be thinking: How can I then join these tables? There doesn't seem to be any cross-reference value that we can join or perform lookups by.  

To solve this issue, MetaModel provides a modification for XPath, the index(...) function. Say we want to add the organization's id to the employee table (as a foreign key). To archieve that, we will need this modified employee table definition (notice the third value XPath expression):  

XmlSaxTableDef employeeTableDef = new XmlSaxTableDef(
   "/root/organization/employees/employee",
   new String[] {
     "/root/organization/employees/employee/name",
     "/root/organization/employees/employee/gender",
   "index(/root/organization)"
   }
);

Now if you query the employees table, this will be your result:  

 

row_id

/name

/gender

index(/root/organization)

0

John Doe

M

0

1

Jane Doe

F

0

2

Peter

M

1

3

Bob

M

1

Moving on, you will be able to define both joins and lookups using this foreign key. For example:

  Column fk = employeeTable.getColumnByName("index(/root/organization)");


  Column empName = employeeTable.getColumnByName("/name");
  Column orgId = organizationTable.getColumnByName("row_id");
  Column orgName = organizationTable.getColumnByName("/name");
 
  Query q = dc.query().from(employeeTable)
   .innerJoin(organizationTable).on(fk, orgId)
   .select(empName).as("employee")
   .select(orgName).as("company").toQuery();
  DataSet ds = dc.executeQuery(q);

The contents of this queried dataset will now be:  

 

employee

company

John Doe

Company A

Jane Doe

Company A

Peter

Company B

Bob

Company B

  • No labels