Avoiding ORA-04068 existing state of packages has been discarded

http://betteratoracle.com/posts/34-avoiding-ora-04068-existing-state-of-packages-has-been-discarded

Avoiding ORA-04068 existing state of packages has been discarded

Sometime back, I wrote about the new 11gR2 feature that allows PLSQL to be changed without an outage. In that series of articles I pointed to ‘fine grained dependency checking’ that was introduced in 11gR1 as an enabler for this feature citing that online changes in 10g were impossible. However, there was a subtle, but now pretty obvious point that I missed back then, and some online changes are in fact possible in 10g

Not All Invalidations Are Equal

In Oracle 10g and before, if a package referenced a table, and the table was modified (by adding a column for example), the package would immediately be marked as invalid. In 11g this invalidation can be avoided, but even if the package does become invalid, it is not as bad as it seems.

The next time the package is called, execution will be delayed for a short period while Oracle recompiles it, and if the compile is successful, it will be called as normal. Even if a connected session has a prepared call to that procedure open, it will handle this sort of invalidation seamlessly. For example:

create table t1 (c1 integer);

create or replace package pkg1
as
  procedure p1;
end;
/

create or replace package body pkg1
as
  procedure p1 
  is
  begin
    for row in (select * from t1) loop
      null;
    end loop;
  end;
end;
/

Next I create a simple JDBC program using JRuby. This program doesn’t do anything useful, but it illustrates the point. I first prepare and execute a call to my package, then I sleep. While the program is sleeping I add a column to T1 which will invalidate the package (because there is a select * in the procedure P1).

require 'java'
java_import 'oracle.jdbc.OracleDriver'
java_import 'java.sql.DriverManager'

oradriver = OracleDriver.new

DriverManager.registerDriver oradriver
@connection = DriverManager.get_connection
    "jdbc:oracle:thin:@localhost:1521/local11gr2.world",      
    'sodonnel', 'sodonnel'
@connection.auto_commit = false

call = @connection.prepare_call("begin pkg1.p1; end;")

call.execute
puts "called once"

sleep(15)

call.execute
puts "called twice"

While the JRuby program is sleeping, I executed the following code:

alter table t1 add (c2 integer);

select object_name, status 
from user_objects
where object_name like 'PKG1';

OBJECT_NAME         STATUS  
------------------- ------- 
PKG1                VALID   
PKG1                INVALID   

So the change to the table invalidated my package, but the JRuby program completed the second call to the package without an error, proving it is not an issue.

ORA-04068: existing state of packages has been discarded

In the above example, the invalidation of the package code is not a problem, as any application will recover seamlessly from it. However, a simple change to the package body can spoil the party:

create or replace package body pkg1
as
  g_var integer; -- <<- Global variable

  procedure p1 
  is
  begin
    for row in (select * from t1) loop
      null;
    end loop;
  end;
end;
/

By adding a global variable to the package, it now has internal state, and this is a problem. Even if this variable is a constant it will still cause problems on invalidations.

If you run the above example again, modifying the table while the JRuby code is sleeping, the second call will produce the following exception:

NativeException: java.sql.SQLException: ORA-04068: existing state of packages has been discarded

This is an artefact of how Oracle maintains the internal package state. A package with no global variable has no internal state, and when it is recompiled Oracle knows it does not have to throw an ORA-04068 if the package is accessed again in the same session.

However, a package with internal state (caused by having a package global variable) will lose that state when it is recompiled, causing the ORA-04068.

Avoiding ORA-04068

To avoid this error, all you have to do is avoid using global variables in your package. That can be easier said than done in some cases, and many well written applications will define a series of constants in the package unknowingly causing this problem.

A workaround is to define a ‘constants package’ that does nothing other that define the constants, and a separate package that accesses the tables and retrieves the constants from the constants package. In that way, the package with the internal state will not be invalidated by any changes to the underlying schema and active sessions will not see any errors.

In conclusion if you need to modify tables while the application is online, especially in 10g, give careful thought to how the code is organised to avoid unexpected ORA-04068 errors in the application.

Update – if you use stand alone functions or procedures in your application, you may want to read about how you can still get an ORA-04068, which was something I wasn’t expecting.

 

http://betteratoracle.com/posts/42-ora-04068-when-compiling-a-function

ORA-04068 when compiling a function

I thought I had figured out all the pitfalls of changing stored procedures and functions on-line, and then it happened again:

ORA-04068: existing state of packages has been discarded

This occurred when upgrading a single stored procedure – no packages, no complex interdependencies and on Oracle 11.2.0.3. The procedure I changed was not referenced by any other procedures, and it was being invoked from Java on a fairly busy system.

I am pretty sure this shouldn’t happen, so I decided to create a test case to try and prove it one way or another.

First of all I need a procedure to play with – pretty much the simplest procedure possible (p1.sql):

create or replace procedure p1
as
begin
  null;
end;
/

And I need a second procedure to change on-line (p1_1.sql):

create or replace procedure p1
as
begin
  null;
  null;
end;
/

Notice the second ‘NULL;’ call in the second procedure, which makes it different from the first version. Then I need some Java code to call this procedure in a loop – again I am going for the most simple Java code I can:

import java.sql.*;
import oracle.jdbc.*;

public class SimpleProc {

    private static Connection conn;

    public static void main(String[] args)
    throws ClassNotFoundException, SQLException, InterruptedException
    {
        connect();

        CallableStatement stmt = conn.prepareCall("begin p1(); end;");
        int count = 0;
        while (true) {
            stmt.execute();
            count ++;
            System.out.println ("Made procedure call number "+ count);
            Thread.sleep(1000);
        }
    }

    public static void connect()
    throws ClassNotFoundException, SQLException
    {
        DriverManager.registerDriver
            (new oracle.jdbc.driver.OracleDriver());

        String url = "jdbc:oracle:thin:@//localhost/local11gr2.world";
        //            jdbc:oracle:thin:@//host:port/service

        conn = DriverManager.getConnection(url,"hotplsql","hotplsql");
        conn.setAutoCommit(false);
    }

}

This code prepares a call to the procedure P1, and then executes it in a loop, printing out the execution count after each execution. Then it sleeps for a second before executing the procedure again, over and over until the process is killed. One thing to note is that the call to the procedure is only prepared one time, but the procedure is executed many times, which is a common pattern in a well written application. If I start the Java running, and open a SQLPLUS session, I find that I can change between the two version of my stored procedure without causing any errors in the Java code:

SQL> @p1

Procedure created.

SQL> @p1_1

Procedure created.

SQL> @p1

Procedure created.

SQL> @p1_1

Procedure created.

SQL> @p1

Procedure created.

So this looks good, it would seem I can recreate the procedure with no problems, which is what I thought when I suggested changing this procedure on our production system.

It turns out not to be that simple. If I make a small change to the Java code, and remove the Thread.sleep(1000) line, things change dramatically. As soon as I compile the new version of my procedure, I get this error:

Made procedure call number 21368
Made procedure call number 21369
Made procedure call number 21370
Made procedure call number 21371
Exception in thread "main" java.sql.SQLException: ORA-04068: existing state of packages has been discarded
ORA-04065: not executed, altered or dropped stored procedure "HOTPLSQL.P1"
ORA-06508: PL/SQL: could not find program unit being called: "HOTPLSQL.P1"
ORA-06512: at line 1

        at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:440)
        at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:396)
        at oracle.jdbc.driver.T4C8Oall.processError(T4C8Oall.java:837)
        at oracle.jdbc.driver.T4CTTIfun.receive(T4CTTIfun.java:445)
        at oracle.jdbc.driver.T4CTTIfun.doRPC(T4CTTIfun.java:191)
        at oracle.jdbc.driver.T4C8Oall.doOALL(T4C8Oall.java:523)
        at oracle.jdbc.driver.T4CCallableStatement.doOall8(T4CCallableStatement.java:204)
        at oracle.jdbc.driver.T4CCallableStatement.executeForRows(T4CCallableStatement.java:1007)
        at oracle.jdbc.driver.OracleStatement.doExecuteWithTimeout(OracleStatement.java:1315)
        at oracle.jdbc.driver.OraclePreparedStatement.executeInternal(OraclePreparedStatement.java:3576)
        at oracle.jdbc.driver.OraclePreparedStatement.execute(OraclePreparedStatement.java:3677)
        at oracle.jdbc.driver.OracleCallableStatement.execute(OracleCallableStatement.java:4694)
        at oracle.jdbc.driver.OraclePreparedStatementWrapper.execute(OraclePreparedStatementWrapper.java:1086)
        at SimpleProc.main(SimpleProc.java:16)

Which is exactly what happened when I put this procedure into production. The error isn’t a total disaster, as if I had my call wrapped in a try-catch block, then on the next execution it will work correctly, but it still means some transactions will fail.

In my opinion there is something strange going on here – when there is a bit of a delay between executions Oracle can handle the procedure recompiling, but if the execution happens too quickly after the recompile, it fails. Tracing the session reveals some interesting information. With a delay in the Java code, the trace file looks like this:

    PARSING IN CURSOR #390309088 len=16 dep=0 uid=223 oct=47 lid=223 tim=264320251394 hv=3303240157 ad='7ff1fa66878' sqlid='36ytppb2f6wfx'
    begin p1(); end;
    END OF STMT
    EXEC #390309088:c=0,e=197,p=0,cr=0,cu=0,mis=0,r=1,dep=0,og=1,plh=0,tim=264320251389
    WAIT #390309088: nam='SQL*Net message to client' ela= 4 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=264320251979

    *** 2013-01-10 10:21:00.532
    WAIT #390309088: nam='SQL*Net message from client' ela= 997998 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=264321250098
    EXEC #390309088:c=0,e=84,p=0,cr=0,cu=0,mis=0,r=1,dep=0,og=1,plh=0,tim=264321250370
    WAIT #390309088: nam='SQL*Net message to client' ela= 1 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=264321250444

    *** 2013-01-10 10:21:01.532
    WAIT #390309088: nam='SQL*Net message from client' ela= 999933 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=264322250439
    EXEC #390309088:c=0,e=179,p=0,cr=0,cu=0,mis=0,r=1,dep=0,og=1,plh=0,tim=264322251647
    WAIT #390309088: nam='SQL*Net message to client' ela= 5 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=264322251825

    <snip>

    *** 2013-01-10 10:21:11.531
    WAIT #390309088: nam='SQL*Net message from client' ela= 998423 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=264332250242
**> EXEC #390309088:c=0,e=3213,p=0,cr=0,cu=0,mis=1,r=1,dep=0,og=1,plh=0,tim=264332253903
    WAIT #390309088: nam='SQL*Net message to client' ela= 5 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=264332254126

    *** 2013-01-10 10:21:12.531
    WAIT #390309088: nam='SQL*Net message from client' ela= 996845 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=264333251088
    EXEC #390309088:c=0,e=182,p=0,cr=0,cu=0,mis=0,r=1,dep=0,og=1,plh=0,tim=264333251703
    WAIT #390309088: nam='SQL*Net message to client' ela= 3 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=264333251880

Notice that the cursor is parsed and then is executed many times with no further parsing (mis=0 in the EXEC lines). Then after I recompile the procedure, we get a EXEC line with mis=1 before it goes back to executing without any misses. So it appears that Oracle is smart enough to notice the procedure has changed, and then gracefully reparse and continue on without an error.

If I remove the sleep command from the Java code the trace file looks different:

    PARSING IN CURSOR #221881568 len=16 dep=0 uid=223 oct=47 lid=223 tim=264706629638 hv=3303240157 ad='7ff1fa66878' sqlid='36ytppb2f6wfx'
    begin p1(); end;
    END OF STMT
    EXEC #221881568:c=0,e=88,p=0,cr=0,cu=0,mis=0,r=1,dep=0,og=1,plh=0,tim=264706629636
    WAIT #221881568: nam='SQL*Net message to client' ela= 3 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=264706629773

    <snip>

    EXEC #221881568:c=0,e=23,p=0,cr=0,cu=0,mis=0,r=1,dep=0,og=1,plh=0,tim=264712941846
    WAIT #221881568: nam='SQL*Net message to client' ela= 1 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=264712941882
    WAIT #221881568: nam='SQL*Net message from client' ela= 149 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=264712942058
    EXEC #221881568:c=0,e=33,p=0,cr=0,cu=0,mis=0,r=1,dep=0,og=1,plh=0,tim=264712942135
    WAIT #221881568: nam='SQL*Net message to client' ela= 2 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=264712942188
    WAIT #221881568: nam='SQL*Net message from client' ela= 1626 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=264712943853
**> WAIT #221881568: nam='library cache pin' ela= 10196 handle address=8792328936552 pin address=8792292571936 100*mode+namespace=842320396222466 obj#=-1 tim=264712954206
**> EXEC #221881568:c=0,e=10380,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=0,tim=264712954356
**> ERROR #3:err=4068 tim=264712954394

In this trace file, we can see the process has waited on ‘library cache pin’, which didn’t happen before. This blocks the execution for a short time and is caused by my other session recompiling the stored procedure. You cannot execute a procedure while it is being compiled, and you cannot compile a procedure while it is being executed either, which is what this pin enforces. As soon as the recompile completes, the library cache pin is released and execution can continue. However this time, instead of the EXEC reporting a library cache miss, it attempts to execute the procedure and fails.

This seems like a bug – maybe the cache pin is released too soon, before all the old cursor resources are cleaned up, or maybe once a process gets as far as being blocked by a cache pin, it has passed the point where it can figure out it needs to reparse the cursor. Either way it is disappointing as it means you cannot safely recompile a procedure in a busy live system (unless of course you use Edition Based Redefinition) without risking some errors.

 

Advertisements
This entry was posted in Uncategorized. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s