Differences between revisions 13 and 14
Revision 13 as of 2011-11-23 15:15:28
Size: 10967
Editor: 62
Comment: add ContributorGroup link
Revision 14 as of 2018-12-19 21:23:10
Size: 0
Editor: JoanTouzet
Comment: Oracle-to-JSON export is native in Oracle now, see https://www.foxinfotech.in/2018/08/export-data-into-json-file-in-oracle-11g-using-pl-sql.html , importing into CouchDB is then trivial
Deletions are marked like this. Additions are marked like this.
Line 1: Line 1:
<<Include(EditTheWiki)>>

= Dump an Oracle Table to CouchDB using Python =
Since CouchDB requires unique keys to access documents, importing data from an SQL database is an incitement to deep thought.

If you went to the trouble of creating a single unique key in your SQL database, you may be inclined to use it as your couchdb _id. Hard to go wrong there. However, for certain groups of documents (say a collection of blog posts), simply using an IsoFormattedDateAsDocId is pretty handy and can greatly simplify writing views in both Javascript and Python.

The first two examples expect a unique key to be extracted from the SQL database, the third example lets CouchDB create the unique key. Of the three, the first and last are the most useful. The second example is for static or "throwaway tables" that have very few values, but might be a pain to just hand-copy (ex. config data).

You'll need the python couchdb module from http://code.google.com/p/couchdb-python/ , plus whatever DBAPI2.0 compliant module you use to connect to your favorite SQL database. These examples use Oracle, but it should be trivial to convert.

== Unique Values in SQL Table Converted to Individual Documents ==
Cro-Magnon simple python script which will take the contents of an SQL table and place it into CouchDB.

This particular example depends on the SQL table having one unique field. The values in the unique field become individual documents in the CouchDB database.

This is probably not what you want. :-)

{{{#!python
#!/usr/bin/env python
import couchdb, cx_Oracle, ConfigParser, os, datetime

# Convert a list of (sql tables, unique identifiers) into couchDB documents
# Simply figure out which db you wish to import from, and a series of tables,
# with unique fields you'd like to have them
# indexed by, put them into the list of tuples below and let fly.
# I store all my config stuff in an ini file locally. you'll have to handle
# your own connection to your db.
# This is pretty quick-and-dirty and I don't recommend it for any sort of
# production anything at all! :-)
# Should work for any db api 2 compliant sql database.
# Script guaranteed 100% slower than christmas.

db_name = 'mydatabase'
table_names = [ ('TABLE_NAME0', 'UNIQUE_FIELD'),
                ('TABLE_NAME1', 'UNIQUE_FIELD'),
                ('TABLE_NAME2', 'UNIQUE_FIELD') ]

class GrabbyMitts(object):
    def __init__( self, db_name ):
        config = ConfigParser.ConfigParser()
        config.optionxform = str
        config.read( [ "sqlgen.ini", os.path.expanduser("~/.sqlgen.ini" ) ] )

        # oracle connection
        self.connection = cx_Oracle.Connection( config.get("Oracle", "login") )

        # couchdb location
        self.couch = couchdb.Server( "http://localhost:5984/" )
        try:
            self.db = self.couch.create( db_name )
        except:
            self.db = self.couch[ db_name ]

    def description( self ):
        # get a description of a given table
        # returns the "header" information in list
        query = "select * from %s where 1=0"%self.table_name
        cursor = cx_Oracle.Cursor( self.connection )
        cursor.execute( query )
        description = [ i[0] for i in cursor.description ]
        cursor.close()
        return description

    def uniques( self ):
        # unique value in sql table to create couchdb document ID
        cursor = cx_Oracle.Cursor( self.connection )
        query = "select %s from %s"%( self.mykey, self.table_name )
        cursor.execute( query )
        myuniques = [ i[0] for i in cursor.fetchall() if i[0] ]
        cursor.close()
        return myuniques

    def updateCouch( self, table_and_key ):
        # populate or update couchdb documents using sql table and unique
        # identifier
        self.table_name, self.mykey = table_and_key
        cursor = cx_Oracle.Cursor( self.connection )
        documents = []
        header = self.description()

        query = """
            select %s
            from %s
            where %s=:myunique"""""%( ", ".join( header ),
                                    self.table_name,
                                    self.mykey )
        cursor.prepare( query )

        for myunique in [ { "myunique": i } for i in self.uniques() ]:
            cursor.execute( None, myunique )
            entry = dict( zip( header, cursor.fetchone() ) )
            # mop up datetime objects as they occur, since json cries foul.
            # will probably need to convert to unix epochal time
            for k, v in entry.items():
                if isinstance( v, datetime.datetime ):
                    entry[ k ] = str( v )

            if str( myunique[ 'myunique' ] ) not in self.db:
                self.db[ str( myunique[ 'myunique' ] ) ] = entry
            else:
                doc = self.db[ str( myunique[ 'myunique' ] ) ]
                doc.update( entry )
                self.db[ str( myunique[ 'myunique' ] ) ] = doc

        cursor.close()

if __name__ == "__main__":
    gm = GrabbyMitts( db_name )
    for table_name in table_names:
        gm.updateCouch( table_name )
}}}
== Unique Values in SQL Table Converted to a Single Document ==
Differs from above in that it places all table data into a single document.

{{{#!sql

select username, shoe_size, nostril_count, owns_weather_ballon from humans;

username | shoe_size | nostril_count | owns_weather_balloon

cletus | 10 | 3 | y
}}}
becomes:

{{{#!python

db[ 'humans' ]{'username': 'cletus','shoe_size': 10,'nostril_count': 3,'owns_weather_balloon': 'y' }
}}}
This probably isn't what you want either! Script does no checking to make sure that your particular value is unique.

{{{#!python
# HEY! THIS VERSION CAN MAKE A MOTHER-HUGE DOCUMENT! KNOW WHAT YOU ARE DOING!!!
# Convert a list of (sql tables, unique identifiers) into a single couchDB document
# Switch out with updateCouch() above

    def updateCouch( self, table_and_key ):
        # populate or update couchdb documents using sql table and unique
        # identifier
        # HEY! THIS CAN MAKE A MOTHER-HUGE DOCUMENT! KNOW WHAT YOU ARE DOING!!!
        self.table_name, self.mykey = table_and_key

        cursor = cx_Oracle.Cursor( self.connection )
        documents = []
        header = self.description()

        query = """
            select %s
            from %s
            order by %s"""""%( ", ".join( header ),
                                    self.table_name,
                                    self.mykey )

        cursor.execute( query )
        results = dict(
                str( row[0] ), dict( zip( header, row ) )
                  for row in cursor.fetchall()
                )

        # clean up any datetime fields
        for row in results:
            for field, value in results[ row ].items():
                if isinstance( value, datetime.datetime ):
                    results[ row ][ field ] = str( value )
        self.db[ self.table_name ] = results

        cursor.close()
}}}

== All Values from SQL Table Imported, Let CouchDB Assign Keys ==

This is most likely what you want!

{{{#!python
#!/usr/bin/env python
# -*- coding: utf-8 -*-
# You can get this from https://github.com/lysdexia/Oracle2CouchDB
#################################################################
# USAGE: Oracle2CouchDB
# Edit the o2c.ini file, in the [oracle] section, supply your oracle
# authentication values, the oracle table to be cloned and an optional
# larger cursor_arraysize value. (A larger arraysize will often be
# advantageous when using fetchmany())
# supply corresponding data for couchdb in the [couch] section
#################################################################

import cx_Oracle, datetime, sys, types, urllib2, ConfigParser, types
from subprocess import Popen, PIPE
import couchdb, couchdb.design
couchdb.json.use('cjson')

class OraSuck(object):
    def get_connection(self, db):
        self.connection = cx_Oracle.Connection(tns)
        self.cursor = cx_Oracle.Cursor(self.connection)

    def header(self, table_name):
        self.cursor.execute("select * from %s where 1=0"%table_name)
        return [i[0].lower() for i in self.cursor.description]

    def sample(self):
        rows = []
        hdr = self.header(oratable)
        cmd = """select %s from %s"""%(", ".join(hdr), oratable)

        # if we have no specified arraysize, we'll use the fetchmany() default
        # of fifty records.
        if cursor_arraysize:
            self.cursor.arraysize = int(cursor_arraysize)
        self.cursor.prepare(cmd)
        self.cursor.execute(cmd)
        while True:
            rows = [dict(zip(hdr, r))
                    for r in self.cursor.fetchmany()]
            for row in rows:
                for key in row:
                    # try to convert DATE to standard ISO format
                    if hasattr(row[key], "isoformat"):
                        row[key] = row[key].isoformat()
            yield rows

class CouchBlow(object):

    def connect(self):
        couch = couchdb.Server(couch_server)

        if not all([username, password]):
            sys.exit("You'll need a username and password")

        couch.resource.credentials = (username, password)
        try:
            db = couch[oratable]
        except:
            db = couch.create(oratable)
        return db

    # use ora.sample() generator to load db
    def load(self):
        get_rows = ora.sample()
        while get_rows:
            rows = get_rows.next()
            for row in rows:
                try:
                    self.db.save(row)
                    print (row["searchtime"])
                except couchdb.http.ResourceConflict, error:
                    err = "%s: %s"%(error.message)
                    print ("%s: %s"%(row["searchtime"], err))

if __name__ == "__main__":
    config = ConfigParser.ConfigParser()
    config.optionxform = str
    config.read("o2c.ini")

    couch_server = config.get("couch", "couch_server")
    username = config.get("couch", "username")
    password = config.get("couch", "password")

    oratable = config.get("oracle", "oratable")
    oradb = config.get("oracle", "oradb")
    tns = "@".join([config.get("oracle", "tns"), oradb])
    cursor_arraysize = config.get("oracle", "cursor_arraysize")

    cb = CouchBlow()
    cb.db = cb.connect()

    ora = OraSuck()
    ora.get_connection(oradb)

    cb.load()
}}}

Example .ini file

{{{#!python
[oracle]
tns = scott/tiger
oradb = mydb
oratable = mytable
cursor_arraysize = 256

[couch]
couch_server = http://my.couchdb:5984
username = dragon_lady
password = what part of get thee gone do you not understand
}}}