You need to be added to the ContributorsGroup to edit the wiki. But don't worry! Just email any Mailing List or grab us on IRC and let us know your user name.
Dump an Oracle Table to CouchDB using Python
Since CouchDB requires unique keys to access documents, importing data from an SQL database is an incitement to deep thought.
If you went to the trouble of creating a single unique key in your SQL database, you may be inclined to use it as your couchdb _id. Hard to go wrong there. However, for certain groups of documents (say a collection of blog posts), simply using an IsoFormattedDateAsDocId is pretty handy and can greatly simplify writing views in both Javascript and Python.
The first two examples expect a unique key to be extracted from the SQL database, the third example lets CouchDB create the unique key. Of the three, the first and last are the most useful. The second example is for static or "throwaway tables" that have very few values, but might be a pain to just hand-copy (ex. config data).
You'll need the python couchdb module from http://code.google.com/p/couchdb-python/ , plus whatever DBAPI2.0 compliant module you use to connect to your favorite SQL database. These examples use Oracle, but it should be trivial to convert.
Unique Values in SQL Table Converted to Individual Documents
Cro-Magnon simple python script which will take the contents of an SQL table and place it into CouchDB.
This particular example depends on the SQL table having one unique field. The values in the unique field become individual documents in the CouchDB database.
This is probably not what you want.
1 #!/usr/bin/env python
2 import couchdb, cx_Oracle, ConfigParser, os, datetime
3
4 # Convert a list of (sql tables, unique identifiers) into couchDB documents
5 # Simply figure out which db you wish to import from, and a series of tables,
6 # with unique fields you'd like to have them
7 # indexed by, put them into the list of tuples below and let fly.
8 # I store all my config stuff in an ini file locally. you'll have to handle
9 # your own connection to your db.
10 # This is pretty quick-and-dirty and I don't recommend it for any sort of
11 # production anything at all! :-)
12 # Should work for any db api 2 compliant sql database.
13 # Script guaranteed 100% slower than christmas.
14
15 db_name = 'mydatabase'
16 table_names = [ ('TABLE_NAME0', 'UNIQUE_FIELD'),
17 ('TABLE_NAME1', 'UNIQUE_FIELD'),
18 ('TABLE_NAME2', 'UNIQUE_FIELD') ]
19
20 class GrabbyMitts(object):
21 def __init__( self, db_name ):
22 config = ConfigParser.ConfigParser()
23 config.optionxform = str
24 config.read( [ "sqlgen.ini", os.path.expanduser("~/.sqlgen.ini" ) ] )
25
26 # oracle connection
27 self.connection = cx_Oracle.Connection( config.get("Oracle", "login") )
28
29 # couchdb location
30 self.couch = couchdb.Server( "http://localhost:5984/" )
31 try:
32 self.db = self.couch.create( db_name )
33 except:
34 self.db = self.couch[ db_name ]
35
36 def description( self ):
37 # get a description of a given table
38 # returns the "header" information in list
39 query = "select * from %s where 1=0"%self.table_name
40 cursor = cx_Oracle.Cursor( self.connection )
41 cursor.execute( query )
42 description = [ i[0] for i in cursor.description ]
43 cursor.close()
44 return description
45
46 def uniques( self ):
47 # unique value in sql table to create couchdb document ID
48 cursor = cx_Oracle.Cursor( self.connection )
49 query = "select %s from %s"%( self.mykey, self.table_name )
50 cursor.execute( query )
51 myuniques = [ i[0] for i in cursor.fetchall() if i[0] ]
52 cursor.close()
53 return myuniques
54
55 def updateCouch( self, table_and_key ):
56 # populate or update couchdb documents using sql table and unique
57 # identifier
58 self.table_name, self.mykey = table_and_key
59 cursor = cx_Oracle.Cursor( self.connection )
60 documents = []
61 header = self.description()
62
63 query = """
64 select %s
65 from %s
66 where %s=:myunique"""""%( ", ".join( header ),
67 self.table_name,
68 self.mykey )
69 cursor.prepare( query )
70
71 for myunique in [ { "myunique": i } for i in self.uniques() ]:
72 cursor.execute( None, myunique )
73 entry = dict( zip( header, cursor.fetchone() ) )
74 # mop up datetime objects as they occur, since json cries foul.
75 # will probably need to convert to unix epochal time
76 for k, v in entry.items():
77 if isinstance( v, datetime.datetime ):
78 entry[ k ] = str( v )
79
80 if str( myunique[ 'myunique' ] ) not in self.db:
81 self.db[ str( myunique[ 'myunique' ] ) ] = entry
82 else:
83 doc = self.db[ str( myunique[ 'myunique' ] ) ]
84 doc.update( entry )
85 self.db[ str( myunique[ 'myunique' ] ) ] = doc
86
87 cursor.close()
88
89 if __name__ == "__main__":
90 gm = GrabbyMitts( db_name )
91 for table_name in table_names:
92 gm.updateCouch( table_name )
Unique Values in SQL Table Converted to a Single Document
Differs from above in that it places all table data into a single document.
select username, shoe_size, nostril_count, owns_weather_ballon from humans; username | shoe_size | nostril_count | owns_weather_balloon cletus | 10 | 3 | y
becomes:
1 db[ 'humans' ]{'username': 'cletus','shoe_size': 10,'nostril_count': 3,'owns_weather_balloon': 'y' }
This probably isn't what you want either! Script does no checking to make sure that your particular value is unique.
1 # HEY! THIS VERSION CAN MAKE A MOTHER-HUGE DOCUMENT! KNOW WHAT YOU ARE DOING!!!
2 # Convert a list of (sql tables, unique identifiers) into a single couchDB document
3 # Switch out with updateCouch() above
4
5 def updateCouch( self, table_and_key ):
6 # populate or update couchdb documents using sql table and unique
7 # identifier
8 # HEY! THIS CAN MAKE A MOTHER-HUGE DOCUMENT! KNOW WHAT YOU ARE DOING!!!
9 self.table_name, self.mykey = table_and_key
10
11 cursor = cx_Oracle.Cursor( self.connection )
12 documents = []
13 header = self.description()
14
15 query = """
16 select %s
17 from %s
18 order by %s"""""%( ", ".join( header ),
19 self.table_name,
20 self.mykey )
21
22 cursor.execute( query )
23 results = dict(
24 str( row[0] ), dict( zip( header, row ) )
25 for row in cursor.fetchall()
26 )
27
28 # clean up any datetime fields
29 for row in results:
30 for field, value in results[ row ].items():
31 if isinstance( value, datetime.datetime ):
32 results[ row ][ field ] = str( value )
33 self.db[ self.table_name ] = results
34
35 cursor.close()
All Values from SQL Table Imported, Let CouchDB Assign Keys
This is most likely what you want!
1 #!/usr/bin/env python
2 # -*- coding: utf-8 -*-
3 # You can get this from https://github.com/lysdexia/Oracle2CouchDB
4 #################################################################
5 # USAGE: Oracle2CouchDB
6 # Edit the o2c.ini file, in the [oracle] section, supply your oracle
7 # authentication values, the oracle table to be cloned and an optional
8 # larger cursor_arraysize value. (A larger arraysize will often be
9 # advantageous when using fetchmany())
10 # supply corresponding data for couchdb in the [couch] section
11 #################################################################
12
13 import cx_Oracle, datetime, sys, types, urllib2, ConfigParser, types
14 from subprocess import Popen, PIPE
15 import couchdb, couchdb.design
16 couchdb.json.use('cjson')
17
18 class OraSuck(object):
19 def get_connection(self, db):
20 self.connection = cx_Oracle.Connection(tns)
21 self.cursor = cx_Oracle.Cursor(self.connection)
22
23 def header(self, table_name):
24 self.cursor.execute("select * from %s where 1=0"%table_name)
25 return [i[0].lower() for i in self.cursor.description]
26
27 def sample(self):
28 rows = []
29 hdr = self.header(oratable)
30 cmd = """select %s from %s"""%(", ".join(hdr), oratable)
31
32 # if we have no specified arraysize, we'll use the fetchmany() default
33 # of fifty records.
34 if cursor_arraysize:
35 self.cursor.arraysize = int(cursor_arraysize)
36 self.cursor.prepare(cmd)
37 self.cursor.execute(cmd)
38 while True:
39 rows = [dict(zip(hdr, r))
40 for r in self.cursor.fetchmany()]
41 for row in rows:
42 for key in row:
43 # try to convert DATE to standard ISO format
44 if hasattr(row[key], "isoformat"):
45 row[key] = row[key].isoformat()
46 yield rows
47
48 class CouchBlow(object):
49
50 def connect(self):
51 couch = couchdb.Server(couch_server)
52
53 if not all([username, password]):
54 sys.exit("You'll need a username and password")
55
56 couch.resource.credentials = (username, password)
57 try:
58 db = couch[oratable]
59 except:
60 db = couch.create(oratable)
61 return db
62
63 # use ora.sample() generator to load db
64 def load(self):
65 get_rows = ora.sample()
66 while get_rows:
67 rows = get_rows.next()
68 for row in rows:
69 try:
70 self.db.save(row)
71 print (row["searchtime"])
72 except couchdb.http.ResourceConflict, error:
73 err = "%s: %s"%(error.message)
74 print ("%s: %s"%(row["searchtime"], err))
75
76 if __name__ == "__main__":
77 config = ConfigParser.ConfigParser()
78 config.optionxform = str
79 config.read("o2c.ini")
80
81 couch_server = config.get("couch", "couch_server")
82 username = config.get("couch", "username")
83 password = config.get("couch", "password")
84
85 oratable = config.get("oracle", "oratable")
86 oradb = config.get("oracle", "oradb")
87 tns = "@".join([config.get("oracle", "tns"), oradb])
88 cursor_arraysize = config.get("oracle", "cursor_arraysize")
89
90 cb = CouchBlow()
91 cb.db = cb.connect()
92
93 ora = OraSuck()
94 ora.get_connection(oradb)
95
96 cb.load()
Example .ini file
1 [oracle]
2 tns = scott/tiger
3 oradb = mydb
4 oratable = mytable
5 cursor_arraysize = 256
6
7 [couch]
8 couch_server = http://my.couchdb:5984
9 username = dragon_lady
10 password = what part of get thee gone do you not understand