Change psql concurrency from autocommit to serializable. by acceso · Pull Request #1190 · SpriteLink/NIPAP

acceso · 2018-05-29T19:58:16Z

Psql Autocommit is giving concurrency errors in PostgreSQL when operations are sent in parallel. Using serializable transactions seems to fix it.

Example:

DETAIL:  Process 176184 waits for ShareLock on transaction 15529683; blocked by process 191002.
 Process 191002 waits for ShareLock on transaction 15529684; blocked by process 178678.
 Process 178678 waits for ExclusiveLock on tuple (1386,16) of relation 43815 of database 16391; blocked by process 176184.
 Process 176184: DELETE FROM ip_net_plan AS p WHERE vrf_id = 0 AND prefix = '10.0.10.240/28'
 Process 191002: DELETE FROM ip_net_plan AS p WHERE vrf_id = 0 AND prefix = '10.0.11.0/28'
 Process 178678: DELETE FROM ip_net_plan AS p WHERE vrf_id = 0 AND prefix = '10.0.10.208/28'
HINT:  See server log for query details.
CONTEXT:  while locking tuple (1386,16) in relation "ip_net_plan"
 SQL statement "UPDATE ip_net_plan SET children =
      (SELECT COUNT(1)
      FROM ip_net_plan
      WHERE vrf_id = OLD.vrf_id
       AND iprange(prefix) << iprange(old_parent.prefix)
       AND indent = old_parent.indent+1)
     WHERE id = old_parent.id"
 PL/pgSQL function tf_ip_net_plan__prefix_iu_after() line 92 at SQL statement
STATEMENT:  DELETE FROM ip_net_plan AS p WHERE vrf_id = 0 AND prefix = '10.0.10.240/28'

This problem is happening pretty often in our deployment (several times per day).

I can easily reproduce it with this simple (and partial) Python code. list_of_prefixes is a file with a prefix on each line:

    with open('list_of_prefixes') as f:
        lines = f.readlines()
    for line in lines:
        line = line.strip('\n')
        procs[line] = multiprocessing.Process(target=delete_prefix, args=(ipam, line,))
    for _, obj in procs.items():
        obj.start()
    for _, obj in procs.items():
        obj.join()

I guess this patch only hides the problem, but changing every query/transaction/function would not be as easy.

Autocommit is giving concurrency errors in PostgreSQL when operations are sent in parallel. Using serializable transactions seems to fix it. For ex: ERROR: deadlock detected DETAIL: Process 176184 waits for ShareLock on transaction 15529683; blocked by process 191002. Process 191002 waits for ShareLock on transaction 15529684; blocked by process 178678. Process 178678 waits for ExclusiveLock on tuple (1386,16) of relation 43815 of database 16391; blocked by process 176184. Process 176184: DELETE FROM ip_net_plan AS p WHERE vrf_id = 0 AND prefix = '10.0.10.240/28' Process 191002: DELETE FROM ip_net_plan AS p WHERE vrf_id = 0 AND prefix = '10.0.11.0/28' Process 178678: DELETE FROM ip_net_plan AS p WHERE vrf_id = 0 AND prefix = '10.0.10.208/28' HINT: See server log for query details. CONTEXT: while locking tuple (1386,16) in relation "ip_net_plan" SQL statement "UPDATE ip_net_plan SET children = (SELECT COUNT(1) FROM ip_net_plan WHERE vrf_id = OLD.vrf_id AND iprange(prefix) << iprange(old_parent.prefix) AND indent = old_parent.indent+1) WHERE id = old_parent.id" PL/pgSQL function tf_ip_net_plan__prefix_iu_after() line 92 at SQL statement STATEMENT: DELETE FROM ip_net_plan AS p WHERE vrf_id = 0 AND prefix = '10.0.10.240/28'

houndci-bot · 2018-05-29T19:58:32Z

nipap/nipap/backend.py

            try:
                self._con_pg = psycopg2.connect(**db_args)
-                self._con_pg.set_isolation_level(psycopg2.extensions.ISOLATION_LEVEL_AUTOCOMMIT)
+                self._con_pg.set_isolation_level(psycopg2.extensions.ISOLATION_LEVEL_SERIALIZABLE)


line too long (98 > 79 characters)

acceso · 2018-06-07T17:59:26Z

I have more feedback about this change. It works if concurrency is low, but when concurrency increases, the deadlocks cause timeouts.

I think the actual problem is harder to fix: there are concurrency issues in the database code. Unfortunately a fix for this is outside my reach.

So far we have mitigated the problem by adding retries in the frontend code. This hides the problem from our users.

acceso · 2018-07-01T02:39:16Z

Here is a different patch that undoes the previous change and locks the tables for the deletes. We have tested it for one week and no complains so far. Performance is around 50% lower but no more deadlocks for us.

houndci-bot reviewed May 29, 2018

View reviewed changes

Use locking in the DELETES to avoid deadlocks

f2554dd

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change psql concurrency from autocommit to serializable.#1190

Change psql concurrency from autocommit to serializable.#1190
acceso wants to merge 2 commits intoSpriteLink:masterfrom
acceso:concurrentdeletes

acceso commented May 29, 2018

Uh oh!

houndci-bot May 29, 2018

Uh oh!

acceso commented Jun 7, 2018

Uh oh!

acceso commented Jul 1, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

acceso commented May 29, 2018

Uh oh!

houndci-bot May 29, 2018

Choose a reason for hiding this comment

Uh oh!

acceso commented Jun 7, 2018

Uh oh!

acceso commented Jul 1, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants