使用 PostgreSQL 的命名游标对查询结果进行分页,相对于 OFFSET+LIMIT 查询,相当于保留了每个查询结果,避免了在翻页时的重复计算。1, 2, 3
这是我自己的测试结果。数据量不大,这个测试用的结果集才 20 条结果,所以效果不太明显。(好吧,其实我这边目前的数据量也没必要用现在这个复杂的方案。只是尝试新东西而已啦 ^_^)
In [m]: %%timeit ....: for i in range(10): ....: c.execute(sql_c, (sql_m, i*2, 2)) ....: list(c) ....: 100 loops, best of 3: 9.83 ms per loop In [n]: %%timeit ....: for i in range(10): ....: c.execute(sql_m.replace('%', '%%') + ' offset %s limit %s', (i*2, 2)) ....: list(c) ....: 10 loops, best of 3: 19.8 ms per loop
我使用了一个 PostgreSQL 函数来创建或者复用 cursor。此函数输入参数有:查询语句、位置偏移、获取的数量。这个函数会检查是否已经存在对应的 cursor,如果没有就把查询语句的 md5 值加前缀「p」作为名字。查询语句当然是程序拼接的,不会有人工输入的那种意义相同但是某些写法不一样造成的不同。
PostgreSQL cursor 有两个很重要的特性。其一,它的内容不会随着数据的更新而更新。所以,在相关数据更新之后,已经创建的 cursor 的数据就陈旧了。我创建了一个创建触发器的函数以便清理陈这些旧的 cursor。另外,cursor 是会占用内存或者磁盘空间的,因此要清理掉长期不使用的 cursor。为此,我维护了一张记录 cursor 最后使用时间的表,以及一个清理函数。
PostgreSQL cursor 特性之二:即使指定了WITH HOLD
,cursor 的生存期也只在当前会话(连接),并且只在当前会话中看得到。所以,清理函数cleanupCursors
还需要将没有记录的 cursor 清除。
CREATE OR REPLACE FUNCTION createCursorTable(name text) RETURNS void AS $$ BEGIN EXECUTE format('CREATE TABLE IF NOT EXISTS %I ( name text UNIQUE, last_used TIMESTAMP WITH TIME ZONE default current_timestamp )', name); EXECUTE format('CREATE INDEX ON %I (last_used)', name); END; $$ LANGUAGE plpgsql; CREATE OR REPLACE FUNCTION createTriggerFor(tname text, cname text) RETURNS void AS $$ BEGIN EXECUTE format($f$ CREATE TRIGGER %I AFTER INSERT OR UPDATE OR DELETE OR TRUNCATE ON %I FOR EACH STATEMENT EXECUTE PROCEDURE cleanupTriggerFunc (%L) $f$, 'cleanupCursorForTable_' || tname || '_' || cname, tname, cname); END; $$ LANGUAGE plpgsql; CREATE OR REPLACE FUNCTION dropTriggerFor(tname text, cname text) RETURNS void AS $$ BEGIN EXECUTE format($f$ DROP TRIGGER %I on %I $f$, 'cleanupCursorForTable_' || tname || '_' || cname, tname); END; $$ LANGUAGE plpgsql; CREATE OR REPLACE FUNCTION cleanupTriggerFunc() RETURNS TRIGGER AS $$ DECLARE cname text := TG_ARGV[0]; BEGIN EXECUTE format('SELECT cleanupCursors(%L, 0)', cname); RETURN NULL; END; $$ LANGUAGE plpgsql; CREATE OR REPLACE FUNCTION fetchFromCursor(tname text, query text, off integer, size integer) RETURNS SETOF record AS $$ DECLARE cname text := 'p' || md5(query); need_update boolean := false; BEGIN PERFORM name FROM pg_cursors WHERE name = cname; IF NOT FOUND THEN EXECUTE format('DECLARE %I SCROLL CURSOR WITH HOLD FOR ', cname) || query; RAISE NOTICE 'new cursor % created', cname; BEGIN EXECUTE format('INSERT INTO %I (name) VALUES (%L)', tname, cname); EXCEPTION WHEN unique_violation THEN need_update := true; END; ELSE need_update := true; END IF; IF need_update THEN EXECUTE format('UPDATE %I SET last_used = current_timestamp WHERE name = %L', tname, cname); END IF; EXECUTE format('MOVE ABSOLUTE ' || off || ' FROM %I', cname); RETURN QUERY EXECUTE format('FETCH ' || size || ' FROM %I', cname); END; $$ LANGUAGE plpgsql; CREATE OR REPLACE FUNCTION cleanupCursors(tname text, timeout real) RETURNS integer AS $$ DECLARE c record; i integer := 0; BEGIN FOR c IN EXECUTE format($f$ SELECT name FROM %I WHERE extract('epoch' from current_timestamp - last_used) > %L $f$, tname, timeout) LOOP PERFORM name FROM pg_cursors WHERE name = c.name; IF FOUND THEN RAISE NOTICE 'closing cursor %', c.name; EXECUTE format('CLOSE %I', c.name); END IF; RAISE NOTICE 'clean up record for cursor %', c.name; EXECUTE format($f$ DELETE FROM %I WHERE name = %L$f$, tname, c.name); i := i + 1; END LOOP; FOR c IN EXECUTE format($f$ SELECT name FROM pg_cursors WHERE name NOT IN ( SELECT name FROM %I ) AND length(name) = 33 AND substring(name for 1) = 'p' $f$, tname) LOOP RAISE NOTICE 'closing cursor % not present in table %', c.name, tname; EXECUTE format('CLOSE %I', c.name); i := i + 1; END LOOP; RETURN i; END; $$ LANGUAGE plpgsql;
使用时需要经常去调用下cleanupCursors
函数。
PostgreSQL 函数还有这么一个特性,当函数返回setof record
时,PostgreSQL 不知道怎么解读那些 record。所以用fetchFromCursor
函数时得明确指定获取结果的行类型:
select * from fetchFromCursor('cursors', $$select name from users where name like 'a%' order by last_login_time$$, 0, 10) as f(name text);
有点麻烦。