使用 PostgreSQL 的命名游标对查询结果进行分页,相对于 OFFSET+LIMIT 查询,相当于保留了每个查询结果,避免了在翻页时的重复计算。1, 2, 3
这是我自己的测试结果。数据量不大,这个测试用的结果集才 20 条结果,所以效果不太明显。(好吧,其实我这边目前的数据量也没必要用现在这个复杂的方案。只是尝试新东西而已啦 ^_^)
In [m]: %%timeit
....: for i in range(10):
....: c.execute(sql_c, (sql_m, i*2, 2))
....: list(c)
....:
100 loops, best of 3: 9.83 ms per loop
In [n]: %%timeit
....: for i in range(10):
....: c.execute(sql_m.replace('%', '%%') + ' offset %s limit %s', (i*2, 2))
....: list(c)
....:
10 loops, best of 3: 19.8 ms per loop
我使用了一个 PostgreSQL 函数来创建或者复用 cursor。此函数输入参数有:查询语句、位置偏移、获取的数量。这个函数会检查是否已经存在对应的 cursor,如果没有就把查询语句的 md5 值加前缀「p」作为名字。查询语句当然是程序拼接的,不会有人工输入的那种意义相同但是某些写法不一样造成的不同。
PostgreSQL cursor 有两个很重要的特性。其一,它的内容不会随着数据的更新而更新。所以,在相关数据更新之后,已经创建的 cursor 的数据就陈旧了。我创建了一个创建触发器的函数以便清理陈这些旧的 cursor。另外,cursor 是会占用内存或者磁盘空间的,因此要清理掉长期不使用的 cursor。为此,我维护了一张记录 cursor 最后使用时间的表,以及一个清理函数。
PostgreSQL cursor 特性之二:即使指定了WITH HOLD
,cursor 的生存期也只在当前会话(连接),并且只在当前会话中看得到。所以,清理函数cleanupCursors
还需要将没有记录的 cursor 清除。
CREATE OR REPLACE FUNCTION createCursorTable(name text) RETURNS void AS $$
BEGIN
EXECUTE format('CREATE TABLE IF NOT EXISTS %I (
name text UNIQUE,
last_used TIMESTAMP WITH TIME ZONE default current_timestamp
)', name);
EXECUTE format('CREATE INDEX ON %I (last_used)', name);
END;
$$ LANGUAGE plpgsql;
CREATE OR REPLACE FUNCTION createTriggerFor(tname text, cname text) RETURNS void AS $$
BEGIN
EXECUTE format($f$
CREATE TRIGGER %I
AFTER INSERT OR UPDATE OR DELETE OR TRUNCATE ON %I
FOR EACH STATEMENT
EXECUTE PROCEDURE cleanupTriggerFunc (%L)
$f$, 'cleanupCursorForTable_' || tname || '_' || cname,
tname, cname);
END;
$$ LANGUAGE plpgsql;
CREATE OR REPLACE FUNCTION dropTriggerFor(tname text, cname text) RETURNS void AS $$
BEGIN
EXECUTE format($f$ DROP TRIGGER %I on %I $f$,
'cleanupCursorForTable_' || tname || '_' || cname, tname);
END;
$$ LANGUAGE plpgsql;
CREATE OR REPLACE FUNCTION cleanupTriggerFunc() RETURNS TRIGGER AS $$
DECLARE
cname text := TG_ARGV[0];
BEGIN
EXECUTE format('SELECT cleanupCursors(%L, 0)', cname);
RETURN NULL;
END;
$$ LANGUAGE plpgsql;
CREATE OR REPLACE FUNCTION fetchFromCursor(tname text, query text, off integer, size integer)
RETURNS SETOF record AS $$
DECLARE
cname text := 'p' || md5(query);
need_update boolean := false;
BEGIN
PERFORM name FROM pg_cursors WHERE name = cname;
IF NOT FOUND THEN
EXECUTE format('DECLARE %I SCROLL CURSOR WITH HOLD FOR ', cname) || query;
RAISE NOTICE 'new cursor % created', cname;
BEGIN
EXECUTE format('INSERT INTO %I (name) VALUES (%L)', tname, cname);
EXCEPTION
WHEN unique_violation THEN
need_update := true;
END;
ELSE
need_update := true;
END IF;
IF need_update THEN
EXECUTE format('UPDATE %I SET last_used = current_timestamp WHERE name = %L',
tname, cname);
END IF;
EXECUTE format('MOVE ABSOLUTE ' || off || ' FROM %I', cname);
RETURN QUERY EXECUTE format('FETCH ' || size || ' FROM %I', cname);
END;
$$ LANGUAGE plpgsql;
CREATE OR REPLACE FUNCTION cleanupCursors(tname text, timeout real) RETURNS integer AS $$
DECLARE
c record;
i integer := 0;
BEGIN
FOR c IN
EXECUTE format($f$
SELECT name FROM %I
WHERE extract('epoch' from current_timestamp - last_used) > %L
$f$, tname, timeout)
LOOP
PERFORM name FROM pg_cursors WHERE name = c.name;
IF FOUND THEN
RAISE NOTICE 'closing cursor %', c.name;
EXECUTE format('CLOSE %I', c.name);
END IF;
RAISE NOTICE 'clean up record for cursor %', c.name;
EXECUTE format($f$ DELETE FROM %I WHERE name = %L$f$, tname, c.name);
i := i + 1;
END LOOP;
FOR c IN
EXECUTE format($f$
SELECT name FROM pg_cursors WHERE name NOT IN (
SELECT name FROM %I
) AND length(name) = 33 AND substring(name for 1) = 'p'
$f$, tname)
LOOP
RAISE NOTICE 'closing cursor % not present in table %', c.name, tname;
EXECUTE format('CLOSE %I', c.name);
i := i + 1;
END LOOP;
RETURN i;
END;
$$ LANGUAGE plpgsql;
使用时需要经常去调用下cleanupCursors
函数。
PostgreSQL 函数还有这么一个特性,当函数返回setof record
时,PostgreSQL 不知道怎么解读那些 record。所以用fetchFromCursor
函数时得明确指定获取结果的行类型:
select * from fetchFromCursor('cursors', $$select name from users where name like 'a%' order by last_login_time$$, 0, 10) as f(name text);
有点麻烦。