codemonth.dk

One project every month - making stuff better ...

Demographically correct test data

One of the main reasons behind finishing versions 1.5.0 of RANDOM_NINJA, was to be able to add the localization for different countries so that I could create test data sets, that are demographically correct.

So after adding that, the TESTDATA_NINJA package, can now create statistically correct data sets for 3 countries:

  • United States
  • China
  • Denmark

The following ratio of data will be correct according to UN Statistics and CIA World Book:

  • Age group - Age ratios will be divided in 0-14, 15-64 and 65+
  • Female/Male - Gender rations will be divided within the different age groups, according to statistics

Identification numbers and birthdays will also be in statistically correct ratios and in valid formats.

As an example I created 3 test tables, by using the population generator from the testdata_ninja package:

create table t_us as select * from table(testdata_ninja.population('US', 0.00001));
create table t_dk as select * from table(testdata_ninja.population('DK', 0.00001));
create table t_cn as select * from table(testdata_ninja.population('CN', 0.00001));

The number after the country code, is the sample size of the population in relation to the real country size. So be careful. If you specify the value 1 for CN you will get 1.3 billion rows!

See below for an example of the output:

SQL> @temp/demo1

Table created.

Elapsed: 00:00:07.59

US Population
-------------
	 3225

  US Males
----------
      1521

US Females
----------
      1704

Age group 0-14
--------------
	   626

Age group 15-64
---------------
	   2135

Age group 65+
-------------
	  464

Males 0-14
----------
       338

Females 0-14
------------
	 288

Males 15-64
-----------
       1067

Females 15-64
-------------
	 1068

 Males 65+
----------
       116

Females 65+
-----------
	348


Table created.

Elapsed: 00:00:00.15

DK Population
-------------
	   55

  DK Males
----------
	25

DK Females
----------
	30

Age group 0-14
--------------
	    10

Age group 15-64
---------------
	     36

Age group 65+
-------------
	    9

Males 0-14
----------
	 5

Females 0-14
------------
	   5

Males 15-64
-----------
	 18

Females 15-64
-------------
	   18

 Males 65+
----------
	 2

Females 65+
-----------
	  7

Table created.

Elapsed: 00:00:23.38

CN Population
-------------
	13570

  CN Males
----------
      7570

CN Females
----------
      6000

Age group 0-14
--------------
	  2334

Age group 15-64
---------------
	   9960

Age group 65+
-------------
	 1276

Males 0-14
----------
      1470

Females 0-14
------------
	 864

Males 15-64
-----------
       5577

Females 15-64
-------------
	 4383

 Males 65+
----------
       523

Females 65+
-----------
	753

SQL>

Tagged in : RANDOM_NINJA