Differences between revisions 3 and 4
Revision 3 as of 2017-08-03 03:28:10
Size: 771
Editor: gmora1223
Comment:
Revision 4 as of 2017-08-11 22:12:43
Size: 6884
Editor: gmora1223
Comment:
Deletions are marked like this. Additions are marked like this.
Line 3: Line 3:
/* Here, it's a detail explanation of the changes done in Apache Marmotta while moving to RDF4J. First, we explain the changes done from Sesame 2.7.16 to 2.8.11. Second, .. Finally,. */ /* Here, it's a detail explanation of the changes done in Apache Marmotta while moving to RDF4J. First, we explain the changes done from Sesame 2.7.16 to 2.8.11. Second, .. Finally, */
Line 5: Line 5:
According the proyect proposal sent to the Google Summer Code Program and as discussed with the community of Marmotta, the migration from Sesame 2.7 to RDF4J was splitted in three partial migrations through intermediate versions. It's a detailed explanation of the changes done in Apache Marmotta while moving to RDF4J. First, we explain the changes done from Sesame 2.7.16 to 2.8.11. Second, Sesame 2.8.11 to Sesame 4 and finally from Sesame 4 to RDF4J 2.

<<TableOfContents(3)>>
There's a migration plan that was done with the help of Marmotta's community to move from Sesame 2.7 to RDF4J. The detailed plan is in the [[https://docs.google.com/document/d/1oql-LqxpAeTXHVp_2GEj-69yzKVkw0q_LuOoP7aVG9s/edit?usp=sharing|proposal]] sent to [[https://summerofcode.withgoogle.com/|GSOC 17]]. Basically, there are three versions to move forward, starting with Sesame 2.7.16 to version 2.8.11. After that, we'll go to Sesame 4, and finally to RDF4J. Between versions, there were many code changes, versions problems, tests were broken, package problems, classes problems, and so on. So, what I'd like to do here is provide a quick summary of the changes done in Marmotta and how we accomplished to get to RDF4J.
Line 10: Line 8:
1. [[http://archive.rdf4j.org/javadoc/sesame-2.8.8/org/openrdf/query/algebra/evaluation/impl/EvaluationStrategyImpl.html|Evaluation Strategy]] uses a [[http://archive.rdf4j.org/javadoc/sesame-2.8.8/org/openrdf/query/algebra/evaluation/federation/FederatedServiceResolver.html|Federated Service Resolver]] that it's used to manage a set of Federated Services. This is used in KiWi Sail Connection, but by default, there isn't a service specified.

2. Sesame 2.8 implements [[http://archive.rdf4j.org/javadoc/sesame-2.8.11/org/openrdf/IsolationLevels.html|Isolation Levels]], so now KiWi can add additional implementations. Also, when an isolation level is required, we're using [[http://archive.rdf4j.org/javadoc/sesame-2.8.11/org/openrdf/IsolationLevels.html#SNAPSHOT_READ|SNAPSHOT_READ]] as a default level.

3. Adopt a new convention by Sesame 2.8 that literals are data typed, so xsd:string or rdf:langString must be assigned if the type is null. Also, null contexts are allowed. These changes solve [[https://issues.apache.org/jira/browse/MARMOTTA-39|MARMOTTA-39]].

4. Hash code of literals includes language and datatype, and it's not necessary to calculate different hashes for objects.

5. Solve a problem when comparing sizes before commit. There was an error while adding two equally statements, but do not commit the last addition. The result should be 1, but instead, show 2 results. This was because the ''getSize()'' method returned database size + a batch, and because these two statements were the same, the result was 2 instead of 1.

6. Some of the new tests showed some errors with subselect in an optional statement. This could be related with [[https://issues.apache.org/jira/browse/MARMOTTA-603|MARMOTTA-603]]

== Road to Sesame 4 ==

1. Use [[http://archive.rdf4j.org/javadoc/sesame-4.1.2/org/openrdf/model/impl/SimpleValueFactory.html|SimpleValueFactory]] instead of [[http://archive.rdf4j.org/javadoc/sesame-2.1.2/org/openrdf/model/impl/ValueFactoryImpl.html|ValueFactoryImpl]], and stop using for instantiation [[http://archive.rdf4j.org/javadoc/sesame-4.1.2/org/openrdf/model/impl/URIImpl.html|URIImpl]], [[http://archive.rdf4j.org/javadoc/sesame-4.1.2/org/openrdf/model/impl/StatementImpl.html|StatementImpl]], [[http://archive.rdf4j.org/javadoc/sesame-4.1.2/org/openrdf/model/impl/BNodeImpl.html|BNodeImpl]], etc., because those are deprecated, instead use value factory.

3. For parsers use [[http://archive.rdf4j.org/javadoc/sesame-4.1.2/index.html?org/openrdf/model/impl/BNodeImpl.html|AbstractRDFParser]] instead of [[http://archive.rdf4j.org/javadoc/sesame-4.1.2/org/openrdf/rio/helpers/RDFParserBase.html|RDFParserBase]]

4. Get use of [[http://archive.rdf4j.org/javadoc/sesame-4.1.2/org/openrdf/model/impl/AbstractValueFactory.html|AbstractValueFactory]] for value factory extension.

5. Use [[http://archive.rdf4j.org/javadoc/sesame-4.1.2/org/openrdf/query/algebra/helpers/AbstractQueryModelVisitor.html|AbstractQueryModelVisitor]] instead of [[http://archive.rdf4j.org/javadoc/sesame-4.1.2/org/openrdf/query/algebra/helpers/QueryModelVisitorBase.html|QueryModelVisitorBase]], and basically until now, stop using deprecated interfaces and use the new ones.

6. Replace [[http://archive.rdf4j.org/javadoc/sesame-4.1.2/org/openrdf/model/URI.html|URI]] for [[http://archive.rdf4j.org/javadoc/sesame-4.1.2/org/openrdf/model/IRI.html|IRI]].

7. Deal with Optionals in literal

8. Statement hashes are generated with contexts, and the hash code of literals are generated only with the label; does not include language and datatype.

9. Do not auto-registry functions since due they are registered with the ServiceRegistry.

10. There's a problem with KiWi; it was not built for special literals like ''+inf'', ''-inf'' or ''NaN''.

11. The dependency ''jsonld-java-sesame'' is now included Sesame 4 in the dependency ''sesame-rio-jsonld''.

12. Moving to Java 8.

== Road to RDF4J ==
1. Change package names.

2. LDP is working, but tests are ignored because an alternative LDP test suite should be found.

3. Accumulo backend might not work properly because the dependency [[https://groups.google.com/forum/#!topic/gremlin-users/CtSLC64gKZA|TinkerPop]] no longer supports ''GraphSail''. However, this is an experimental backend that could be removed in future versions.

4. Replace deprecated classes and methods.

5. Upgrade JUnit to version 4.12.

6. Ignore test that loads a huge dataset.

== Conclusions ==
Well, to finish I'd like to give some final thoughts. I've test Marmotta with H2 and PostgreSQL, and it's fully functional and tests are working too; dont forget to use JDK 8. If you are a skeptic and wanna try on your own, please clone [[https://github.com/gmora1223/marmotta/tree/MARMOTTA-659|MARMOTTA-659]] branch. This is only a summary of the main things that have been done in Marmotta to upgrade to RDF4J–it looks easier than it actually was. After this, I learned a lot of how Marmotta is built in its core, and I would love to continue contributing.
Finally, I want to list some things that I think it must be done.

1. After merging into develop branch, MARMOTTA-39 should be closed.

2. Some new tests in Sesame showed some falls with KiWi, so take a look to MARMOTTA-603.

3. Special values like ''+inf'', ''-inf'' or ''NaN'' should be considered.

4. Find an alternative to LDP test suite.

5. Deal with deprecated dependencies and experimental backends (this could be solved with the last edit).

6. Decide if the test that load a big dataset should be ignored.

7. There's some modules that Sesame words, so IHO they should be changed to RDF4J.
Line 12: Line 76:
== Road to Sesame 4 ==
== Road to RDF4J ==
'''EDIT:''' It seems that TinkerPop 3.x could have an implementation of GraphSail. Some of the last [[https://groups.google.com/forum/#!topic/gremlin-users/CtSLC64gKZA|messages]] of the community say that they are working on it, stay tuned.

Migration Path

There's a migration plan that was done with the help of Marmotta's community to move from Sesame 2.7 to RDF4J. The detailed plan is in the proposal sent to GSOC 17. Basically, there are three versions to move forward, starting with Sesame 2.7.16 to version 2.8.11. After that, we'll go to Sesame 4, and finally to RDF4J. Between versions, there were many code changes, versions problems, tests were broken, package problems, classes problems, and so on. So, what I'd like to do here is provide a quick summary of the changes done in Marmotta and how we accomplished to get to RDF4J.

Road to Sesame 2

1. Evaluation Strategy uses a Federated Service Resolver that it's used to manage a set of Federated Services. This is used in KiWi Sail Connection, but by default, there isn't a service specified.

2. Sesame 2.8 implements Isolation Levels, so now KiWi can add additional implementations. Also, when an isolation level is required, we're using SNAPSHOT_READ as a default level.

3. Adopt a new convention by Sesame 2.8 that literals are data typed, so xsd:string or rdf:langString must be assigned if the type is null. Also, null contexts are allowed. These changes solve MARMOTTA-39.

4. Hash code of literals includes language and datatype, and it's not necessary to calculate different hashes for objects.

5. Solve a problem when comparing sizes before commit. There was an error while adding two equally statements, but do not commit the last addition. The result should be 1, but instead, show 2 results. This was because the getSize() method returned database size + a batch, and because these two statements were the same, the result was 2 instead of 1.

6. Some of the new tests showed some errors with subselect in an optional statement. This could be related with MARMOTTA-603

Road to Sesame 4

1. Use SimpleValueFactory instead of ValueFactoryImpl, and stop using for instantiation URIImpl, StatementImpl, BNodeImpl, etc., because those are deprecated, instead use value factory.

3. For parsers use AbstractRDFParser instead of RDFParserBase

4. Get use of AbstractValueFactory for value factory extension.

5. Use AbstractQueryModelVisitor instead of QueryModelVisitorBase, and basically until now, stop using deprecated interfaces and use the new ones.

6. Replace URI for IRI.

7. Deal with Optionals in literal

8. Statement hashes are generated with contexts, and the hash code of literals are generated only with the label; does not include language and datatype.

9. Do not auto-registry functions since due they are registered with the ServiceRegistry.

10. There's a problem with KiWi; it was not built for special literals like +inf, -inf or NaN.

11. The dependency jsonld-java-sesame is now included Sesame 4 in the dependency sesame-rio-jsonld.

12. Moving to Java 8.

Road to RDF4J

1. Change package names.

2. LDP is working, but tests are ignored because an alternative LDP test suite should be found.

3. Accumulo backend might not work properly because the dependency TinkerPop no longer supports GraphSail. However, this is an experimental backend that could be removed in future versions.

4. Replace deprecated classes and methods.

5. Upgrade JUnit to version 4.12.

6. Ignore test that loads a huge dataset.

Conclusions

Well, to finish I'd like to give some final thoughts. I've test Marmotta with H2 and PostgreSQL, and it's fully functional and tests are working too; dont forget to use JDK 8. If you are a skeptic and wanna try on your own, please clone MARMOTTA-659 branch. This is only a summary of the main things that have been done in Marmotta to upgrade to RDF4J–it looks easier than it actually was. After this, I learned a lot of how Marmotta is built in its core, and I would love to continue contributing. Finally, I want to list some things that I think it must be done.

1. After merging into develop branch, MARMOTTA-39 should be closed.

2. Some new tests in Sesame showed some falls with KiWi, so take a look to MARMOTTA-603.

3. Special values like +inf, -inf or NaN should be considered.

4. Find an alternative to LDP test suite.

5. Deal with deprecated dependencies and experimental backends (this could be solved with the last edit).

6. Decide if the test that load a big dataset should be ignored.

7. There's some modules that Sesame words, so IHO they should be changed to RDF4J.

EDIT: It seems that TinkerPop 3.x could have an implementation of GraphSail. Some of the last messages of the community say that they are working on it, stay tuned.

MARMOTTA-659/MigrationPath (last edited 2017-08-11 22:12:43 by gmora1223)