Advertisements
Feeds:
Posts
Comments

Archive for July, 2017

With some applications, there may be a need to pull data from other database instances, or from other databases within the same instance.  Often this can be achieved by having multiple connections from your application pointing to each of these data sources. This is great for this one application, but if there is a need to do this within the database instance or for stored procedures or views then we will generally use full part naming convention in RDMS world.

For example, SQL Server offers the functionality to reference objects within the database you are working or to reference objects in another database or even a different instance of SQL Server.  This is referred to as four-part naming.  The reason for this name is that there can be four parts that are used to reference the object as the following shows:

server.database.schema.object

or we can also reference the database that the object resides in such as

select * from master.dbo.sysdatabases

A similar approach in MongoDB is to use db.getSiblingDB() database method and looping for each database. This is loosely analogous to the undocumented Stored Procedure in SQL Server called sp_MSforeachdb which is quite handy when you do not want to use cursors. You can use db.getSiblingDB() as an alternative to the use helper. This is particularly useful when writing scripts using the mongo shell where the use helper is not available. To expand on that, I have written a simple script that will loop through all the databases with the instance and then use db.getCollectionNames() method to access Collections within each database.

  • Main Code –

// The following command-line parameters can be set in an eval block:
var sleepBetweenDBs =   sleepBetweenDBs || 4000;
var sleepBetweenBatches =  sleepBetweenBatches || 1000;
var batchsize = batchsize  || 50000;
var swidsPath =  "/Users/pathToLoadIDs/testids.js"
var m4_dbsPath =  "/Users/pathToDBsArray/DBsArray.js"

//Time Log starts
var startTime = new Date();print("Start time = " + startTime)
var baseHours = startTime.getHours();
print (baseHours);

load(swidsPath)
var swidsToRemove = JSON.parse(swids);
arraylength = swidsToRemove.length
print("arraylength ",arraylength)

// LIST DBs
load(m4_dbsPath)
var m4dbsToProcess = JSON.parse(m4dbs);
dbcount = m4dbsToProcess.length
print ("Database count " + dbcount)
print("Databases being processed in this run are as below ")
printjson(m4dbsToProcess)

//   Main script entry

temp = 0
for (index = 0; index < arraylength; index = index + batchsize) {
chunk = swidsToRemove.slice(index, index+batchsize);
print ("\n**************   Current batch size: " + chunk.length + " |  Processed SWIDs so far:  " + temp + " |  Total SWIDs in this run :  " + arraylength + "   **************")
temp = temp + chunk.length
m4dbsToProcess.forEach(function(database) {
db = db.getSiblingDB(database);
if (db != 'local' && db !='admin') {

print("-------------------------------------------------------------------");
print("\database currently processing : " + db)
print("-------------------------------------------------------------------");
cols = db.getCollectionNames();
colcount = cols.length deletedCountPerBatchDB = 0 cols.forEach(function(collname) {
collectionObject = db[collname];
print("\nCount of Collections waiting to be processed next in " + db + " : " + colcount)

if(typeof collectionObject !== "undefined" ) {
    var InitialDocCount = collectionObject.count() 

print("collection Name : ", collname)
print("\nStarting removal process for " + collname + "...\n")
print("...") 

var c = collectionObject.find({m4Id: {$in: chunk}}).count();
print ("select count of records found : " + c)  

var deletedCountPerColl = (InitialDocCount - collectionObject.count());
print("Total documents deleted for collection (" + collname + ") in this batch : ", deletedCountPerColl) ; colcount = colcount - 1;

deletedCountPerBatchDB = (deletedCountPerBatchDB + deletedCountPerColl);
}      

if (colcount == 0) {
print("\nTotal count of all documents deleted from " + db + " in this batch = \n" + deletedCountPerBatchDB + "\n")
      }
    });
  }sleep(sleepBetweenDBs);
});

var percentcomplete = ((temp*100)/arraylength)if (percentcomplete ==100) { print("\nAll SWIDs Successfully Processed | Percent Processed: " + percentcomplete + "%");  }
else  {
       print("STATUS: Percent Processed so far in this run : " + percentcomplete + "%");
// print("\nMoing to next batch...")
}sleep(sleepBetweenBatches);
};

print("===================================================================");

var endTime = new Date();print(endTime);
var timeSpent = ((endTime.getTime() - startTime.getTime())/60000);
print("Script took " + timeSpent + " minutes.");
  • Executing the above script –

You can directly call the .js file from mongo shell as below, and mongo will execute the JavaScript directly.


$MONGO_HOME\bin>mongo.exe -u username -p password server[:port]/Admin [--eval "var param1=value, param2=value"] path_to_script.js

  • An example –

mongo -u user_m4 -p password devdata01.dataginger.com/m4_TSS  --eval "var sleepBetweenBatches=1000" script_dev.js

Advertisements

Read Full Post »