hit_counter a,
hit_counter b
where b.tcomstamp = a.tcomstamp + 300
and b.statistic_id = a.statistic_id
order by a.tcomstamp, a.statistic_id
上述脚本有重大缺陷:如果第二个累计值不是正好在第一个累计值之后5分钟取得的,那么就无
法连接这两条记录。于是,我们改以“范围条件”定义连接。查询如下:
select a.tcomstamp,
a.statistic_id,
(b.counter - a.counter) * 60 /
(b.tcomstamp - a.tcomstamp) hits_per_minute
from hit_counter a,
hit_counter b
where b.tcomstamp between a.tcomstamp + 200
and a.tcomstamp + 400
and b.statistic_id = a.statistic_id
order by a.tcomstamp, a.statistic_id
这个方法还是有缺陷:前后两次计算累计值的时间间隔,如果不介于200 到400 秒之间(例
如取样频率改变了),如此之大的时间跨度就会引起风险。
我们还有更安全的方法,就是使用基于“记录窗口(windows of rows)”的OLAP函数(OLAP
function)。难以想象,这种本质上不太符合关系理论的技术可以显著提升xìng能,但应作为查询
优化的最后手段使用。借助partition 子句,OLAP函数支持“分别处理结果集的不同子集”,比如
分别对它们进行排序、总计等处理。借助OLAP 函数row_number(),可以根据statistic_id 建立
子集,然后按时间戳增大的顺序为不同统计赋予连续整数编号,接下来,就可以连接statistic_id
和两个序号了,如下例子所示:
select a.tcomstamp,
a.statistic_id,
(b.counter - a.counter) * 60 /
(b.tcomstamp - a.tcomstamp)
from (select tcomstamp,
statistic_id,
counter,
row_number( ) over (partition by statistic_id
order by tcomstamp) rn
from hit_counter) a,
(select tcomstamp,
statistic_id,
counter,
row_number( ) over (partition by statistic_id
order by tcomstamp) rn
from hit_counter) b