One project every month - making stuff better ...

Building Facebook or Germany ....

Whether it is for work, or we are just playing with a new idea back home, there comes a time when we need to create some test data for whatever it is we are testing. The biggest problem with creating test data, is creating some data, that actually looks like real data. Because we want "real" data, if we really want to test how our application works or performs. Things like index selectivity, histograms and much more inside oracle will all behave differently depending on what kind of data there are in your tables. So creating life like data, is important.

Luckily I have previously created a base library that generates random data within a lot of different data domains. The RANDOM_NINJA package can create all that data, but because of the required flexibility in that library, it might not be user friendly, if you just want to create some tables fast to test on. So I decided to create a library that is based on the random_ninja library but binds it together in a little more easy way to create data.

The library is based on generators. All generators are pipelined functions, making it easy to use in create table statements. All generators by default will return 10 rows, but that can be controlled by setting the testdata_ninja.g_default_generator_rows parameter.

In the future there will be more control options for the generators, apart from just the rowcount. The current available generators (and their fields) are:

  • people - Generate a table of people with basic information.
    • Country - Country of the person.
    • Identification - The ID number of the person. Will be a valid ID based on the Country.
    • First name - First name. Will be country and gender specific if data is available in random_ninja.
    • Middle name - Possible middle name. Will be country and gender specific if data is available in random_ninja.
    • Last name - Last name. Will be country and gender specific if data is available in random_ninja.
    • Birthdate - Birthdate of the person.
    • Gender
  • users - Generate a table of website users.
    • Username - Username of website user.
    • Email - User email address.
    • Address1 - First part of address. Always present.
    • Address2 - Second part of address. Randomly present.
    • Zipcode - Zipcode of the address.
    • State - State of the address.
    • Creditcard - Type of credit card.
    • Creditcard number - Creditcard number.
    • Creditcard expiry - Creditcard expiry.
    • Password - The hashed password.
  • cdr - Generate a table of cdr records.
    • Orig imsi - IMSI number of the originating caller phone.
    • Orig isdn - Phone number of originating caller.
    • Orig imei - IMEI of originating caller.
    • Call type - Call type.
    • Call type service - Call service type.
    • Call start latitude - Latitude of the originator at the start of the call.
    • Call start longtitude - Longtitude of the orginator at the start of the call.
    • Call date - Date of the call.
    • Call duration - Call length in seconds.
    • Dest imsi - IMSI number of the destination caller phone.
    • Dest isdn - Phone number of destination phone.
    • Dest imei - IMEI of destination.
    • Network operator - Network operator code.
  • articles - Create a table of articles.
    • Author - The author of the article.
    • Written - Date article is written.
    • Headline - Headline.
    • Lead paragraph - Leading paragraph.
    • Main article - The main part of the article.
    • End paragraph - Final articel paragrpah.

Here are a couple of quick examples, of how to use these generators:

SQL> select author, written, headline from table(testdata_ninja.articles);

------------------------- --------- ---------------------------------------------------------------------------
Connor Martin		  25-MAY-88 Wosfif firewej jusuzso ce cuzki popici hiwtitmun kofihab hed pek bifnig ho.
Jocelyn Clark		  15-MAR-54 Kicajek kita nobi suvil biri.
Cooper Peterson 	  20-AUG-53 Zice ribibe tivez suhipis hodgusot leli deji kutel rikerru.
Joshua Sanchez		  07-SEP-49 Pelkepo recu nina todrur fuzu metne kito fadofmo.
Jack Bennett		  09-FEB-89 Sevwima hemzippa revo faj gurari jornecve.
Robert Cook		  18-FEB-63 Fik dovic su cideb viwbiw mukcis kewi dafot ludne.
Claire Peterson 	  14-MAR-27 Celov cifnel hidim himho cab tothi pewoj.
Angel Lewis		  18-AUG-85 Neffeh sikgac vepde dodced somi ve.
Kimberly Brooks 	  25-MAY-24 Vispefe wongip rovce dosohfi wopahgul decus seccor li ne kemigi ribziv.
Camila James		  05-NOV-79 Del jec si tefiw junvej firi me domop hokwod wagi.

10 rows selected.


SQL> select orig_isdn, dest_isdn, call_type, network_operator from table(testdata_ninja.cdr(5));

---------------- ---------------- -------------------- ----------------
    610395645768     610374552632 GPRS				  50533
    610473346765     610346293254 SMS-MO			  50533
    610396294227     610445323195 MTC				  50508
    610465183483     610441277852 MTC				  50528
    610816661374     610474214618 MTC				  50531


SQL> select email, creditcard, creditcard_num, creditcard_expiry from table(testdata_ninja.users(20))

------------------------------ ------------ ---------------- ----------				    5445474721236337 11/2019		       electron     4026872779551439 11/2016	       dccarte	      30078814745248 10/2018	       instapay     6377124485183353 10/2018	       dcintl	      36118322325483 10/2016	       bankcard     5610034121170356 12/2017		       maestro	    5018241101642219 11/2016		       instapay     6376485566264604 12/2018	       visa	    4073124637535242 11/2016 	       bankcard     5610601881735414 12/2017		       dcintl	      36937035846404 11/2019

------------------------------ ------------ ---------------- ----------	       laser	    6304371058259425 11/2016	       dcintl	      36735890535763 11/2017	       electron     4026625732541781 11/2018	       dcenroute     201484253888890 11/2017	       instapay     6370776794328336 10/2018		       instapay     6371441304932486 10/2017	       bankcard     5610378422745082 12/2017		       laser	    6304356763584429 10/2017 	       laser	    6304865268346945 10/2019

20 rows selected.


A part from the mentioned column, each generator will always return 3 different metadata columns, that can be used. One is [generator_name]_num_pk which can be used as a numerical primary key. Then we have [generator_name]_char_pk which is a varchar based primary key. Finally there is the [generator_name]_cdate which will be the time the row was created by the generator.

See below example, for those colums:

SQL> alter session set nls_date_format = 'DD-MM-YYYY HH24:MI:SS';

Session altered.

SQL> select user_num_pk, user_cdate, email, creditcard_num from table(testdata_ninja.users(4));

---------------- ------------------- ------------------------------ ----------------
	       1 25-10-2016 01:02:23		      36702155323657
	       2 25-10-2016 01:02:23 	    6304411475212445
	       3 25-10-2016 01:02:23 	    6274429878873241
	       4 25-10-2016 01:02:23	     201407836981467

SQL> r
  1* select user_num_pk, user_cdate, email, creditcard_num from table(testdata_ninja.users(4))

---------------- ------------------- ------------------------------ ----------------
	       1 25-10-2016 01:02:29		    5166911688365472
	       2 25-10-2016 01:02:29	    5454445162043414
	       3 25-10-2016 01:02:29	    5193611224545843
	       4 25-10-2016 01:02:29	    5134618238215427


To install the package you have the easy way or the hard way. The hard way is to first download the RANDOM_NINJA library, install that and then download TESTDATA_NINJA and install the sources. To get the installation order, simply look at the random_ninja npg.spec file and then the testdata_ninja npg.spec file.

The other way (hint: the easy way) is to use the NPG package manager and just use the npg command to install the package. Below you can see how easy it is. It handles all the necessary dependencies for you automatically.

[oracle@localhost cli]$ ./npg list all t/t
NPG List

  Installed on 17-Sep-2016
  Version installed 0.0.1 (EE949D8852D8437171BE78477C3F264F88030A45)
  Installed on 17-Sep-2016
  Version installed 0.0.1 (ADCF5CA602A75ABC4FD2C1008E74D25FE648AE76)
  Installed on 03-Sep-2016
  Version installed 0.0.3 (D37510AF8BD581EA5FAF1658FB730DE49BC6FFCD)

[oracle@localhost cli]$ ./npg install testdata_ninja t/t
NPG Install

Starting installation of: testdata_ninja
Session user is: T
Package: testdata_ninja ready to be installed.
Downloading NPG file.
Unpacking npg zip file.
Validating NPG requirements.
NPG Package random_ninja required and not installed.
random_ninja exists and is being installed.
Starting installation of: random_ninja
Session user is: T
Package: random_ninja ready to be installed.
Downloading NPG file.
Unpacking npg zip file.

Validating NPG requirements.
All NPG requirements validated.
Compiling sources.
Sources compiled without errors.
1 NPG installed successfully.
All NPG requirements validated.
Compiling sources.
Sources compiled without errors.
1 NPG installed successfully.

[oracle@localhost cli]$ 

The next generators being added to the library:

  • population - Generate a table of a demographically correct people based on country chosen.
  • transactions - Generate a table of transactions, be it credit card transactions, bank transactions or merchant pos transactions.
  • shopping - Generate a table of consumer buys.

Most of these generators will depend on version 1.1.0 of the random_ninja library though, so that one is coming first.

Tagged in : RANDOM_NINJA