codemonth.dk

One project every month - making stuff better ...

Exponential Moving Averages, Histograms and Gauges

Had a couple of extra minutes today, so I thought about if I could add something to the lightweight monitoring that I created earlier last month. One of things I really wanted, was to add exponential moving average. Moving averages, are normally a little better indicator of the overall performance, as a normal average tends to hide momentary bumps, that you might want to investigate.

Another thing that could be useful, was a histogram over the call times. A histogram, is good way of showing the distribution of calls, which will also show us, the amount of outliers that are there. Outliers can be hidden in averages, so having something to flag this, is a good thing.

The last really simple thing I wanted to add, was a gauge. So we already have a counter, but I wanted to add something where we could add up a total, in addition to just increment a number by one each time.

My first encounter with exponential moving averages (EMA), was actually not in the performance field. In fact most of my ideas about what to do in the Oracle world, rarely comes from doing stuff with Oracle. So at one point I was writing an investment engine for controlling my own pension fund (don't worry, not another "how you can make millions in 30 days", but more like "how to make sure to have a low but consistent gain over 30 years), and for that, I needed to add some EMA calculations to parts of the engine. What they do, is like a simple moving average (SMA; where you take the average of a certain range of numbers, and not the full set), EMA's do the same, but adds a factor where the later numbers have a slightly more important weight in the result than earlier results. Read up on wikipedia for the full details.

So if we start a performance counter for a certain operation, we can now get an EMA as well as a SMA. For now the SMA/EMA period is 10 samples, but I will change that to a variable in a later release. The result will be in the report table function, so we do not need to change anything to get this functionality. So let's take this small demo plsql block, and run the report on that:


declare
btime timestamp;
lcount number;
begin
stats_ninja.clear('stats_ninja_demo');
for i in 1..30 loop
  btime := systimestamp;
  select count(*)
  into lcount
  from all_objects
  where length(object_name) = round(dbms_random.value(10,25));
  stats_ninja.gs('stats_ninja_demo', systimestamp - btime);
end loop;
end;
/

Which gives us the following result:


PL/SQL procedure successfully completed.

SQL> select * from table(stats_ninja.report_gs('stats_ninja_demo'));

COUNTER_NAME	     STATISTIC_NAME			      STATISTIC_NUM_VAL
-------------------- ---------------------------------------- -----------------
stats_ninja_demo     Call count 					     30
stats_ninja_demo     Average calltime (ms)				   3254
stats_ninja_demo     Maximum calltime (ms)				   4698
stats_ninja_demo     Minimum calltime (ms)				   2402
stats_ninja_demo     Simple Moving Average (ms) 			   3430
stats_ninja_demo     Exp Moving Average (ms)				   3340

6 rows selected.

SQL> 

Only relying on averages, means not seeing the outliers, because of the smoothing that happens in any type of calculation of the mean. So to add a histogram to our performance monitor, we need to tell it what buckets we want to see the histogram in. For a detailed explanation on histograms (in simple terms, distribution of values in a set) you can go and read the wikipedia entry. For my first implementation of it, I have decided that the user defines both the number of buckets and the width. Later implementations might calculate this automatically, but it is not easy to find the right algorithm for this. Depends both on the size of the result set, and the value range, as picking the wrong bucket count and width, can just as easily obscure outliers, as any of the mean calculations can.

To add a histogram to our performance monitor, all we need to do, is to call the gs procedure with a special set of parameters, where we define the buckets and width:


stats_ninja.gs('stats_ninja_demo', 'hc', '2000,2100,2200,2300,2500,2700,2900,3100');

So if we wanted to see a histogram of the different runtimes of the previous code, all we need to do is add that little piece of code too our anonymous plsql block, and we can see the distribution:


declare
btime timestamp;
lcount number;
begin
stats_ninja.clear('stats_ninja_demo');
stats_ninja.gs('stats_ninja_demo');
stats_ninja.gs('stats_ninja_demo', 'hc', '2000,2100,2200,2300,2500,2700,2900,3100');
for i in 1..30 loop
  btime := systimestamp;
  select count(*)
  into lcount
  from all_objects
  where length(object_name) = round(dbms_random.value(10,25));
  stats_ninja.gs('stats_ninja_demo', systimestamp - btime);
end loop;
end;
/

And if we extract the report, we can see the following:


SQL> select * from table(stats_ninja.report_gs('stats_ninja_demo'));

COUNTER_NAME	     STATISTIC_NAME			      STATISTIC_NUM_VAL
-------------------- ---------------------------------------- -----------------
stats_ninja_demo     Call count 					     31
stats_ninja_demo     Average calltime (ms)				   2405
stats_ninja_demo     Maximum calltime (ms)				   3557
stats_ninja_demo     Minimum calltime (ms)				   2135
stats_ninja_demo     Simple Moving Average (ms) 			   2658
stats_ninja_demo     Exp Moving Average (ms)				   2831
stats_ninja_demo     Value histogram (<= 2000)				      0
stats_ninja_demo     Value histogram (<= 2100)				      0
stats_ninja_demo     Value histogram (<= 2200)				      9
stats_ninja_demo     Value histogram (<= 2300)				     11
stats_ninja_demo     Value histogram (<= 2500)				      4
stats_ninja_demo     Value histogram (<= 2700)				      2
stats_ninja_demo     Value histogram (<= 2900)				      0
stats_ninja_demo     Value histogram (<= 3100)				      1

14 rows selected.

SQL> 

The final thing I wanted to add, was a Gauge that could track the total amount of time, that was spent in all the samples. To enable that, again we just call the gs procedure with a special set of parameters:


stats_ninja.gs('stats_ninja_demo', 'g', 'total');

So our final code looks like this:


declare
btime timestamp;
lcount number;
begin
stats_ninja.clear('stats_ninja_demo');
-- Normal increment
stats_ninja.gs('stats_ninja_demo');
-- Enabling histograms
stats_ninja.gs('stats_ninja_demo', 'hc', '2000,2100,2200,2300,2500,2700,2900,3100');
-- Enabling the gauge
stats_ninja.gs('stats_ninja_demo', 'g', 'total');
for i in 1..30 loop
  btime := systimestamp;
  select count(*)
  into lcount
  from all_objects
  where length(object_name) = round(dbms_random.value(10,25));
  -- Increment with time statistics.
  stats_ninja.gs('stats_ninja_demo', systimestamp - btime);
end loop;
end;
/

And yields the following report:


SQL> select * from table(stats_ninja.report_gs('stats_ninja_demo'));

COUNTER_NAME	     STATISTIC_NAME			      STATISTIC_NUM_VAL
-------------------- ---------------------------------------- -----------------
stats_ninja_demo     Call count 					     31
stats_ninja_demo     Average calltime (ms)				   2633
stats_ninja_demo     Maximum calltime (ms)				   4423
stats_ninja_demo     Minimum calltime (ms)				   2210
stats_ninja_demo     Simple Moving Average (ms) 			   2595
stats_ninja_demo     Exp Moving Average (ms)				   2523
stats_ninja_demo     Counter gauge					  78990
stats_ninja_demo     Value histogram (<= 2000)				      0
stats_ninja_demo     Value histogram (<= 2100)				      0
stats_ninja_demo     Value histogram (<= 2200)				      0
stats_ninja_demo     Value histogram (<= 2300)				      4
stats_ninja_demo     Value histogram (<= 2500)				     13
stats_ninja_demo     Value histogram (<= 2700)				      4
stats_ninja_demo     Value histogram (<= 2900)				      4
stats_ninja_demo     Value histogram (<= 3100)				      0

15 rows selected.

SQL> 

So only one question remains: Why doesn't the count in the histogram buckets add up to the total number of samples (31 vs 4+13+4+4=25)? Well, I forgot to add the last bucket, which collects the ones that runs longer than the max of the last bucket. I will fix that in the next iteration.

Tagged in : DBMS_SESSION, Pipelined function, regex_substr, sys_context