The "better quality" empty array - or: Why doing nasty things can help avoiding ORA-00600!

Sometimes, Oracle databases are doing "crazy" things - and often, this results in ORA-00600 or ORA-07445 errors. Yes, it's software and there isn't any software in the world without bugs.

This time, we've got an ORA-00600 error, only for  a specific database version (12.1) and only with a specific PSU level. Or at least, we only could reproduce this error only on this database version and PSU level (which does not mean, by experience, it could not happen anywhere else):
ORA-00600 [pmus.c#101] and/or ORA-00600 [pmus.c#103] where raised while importing a special file into our database. 

While my colleagues found at least two (unfixed) bugs in support.oracle.com and opened an SR, I tried to figure out in deep with one of our senior developers, what happens here, how and when we can reproduce this error and maybe, how can we create a workaround???

In the trace file, I have found an interesting thing which signaled the error:

========== FRAME [15] (pmusmal_Trim_All()+821 -> kgeasnmierr()) ==========
defined by frame pointers 0x7fffa4b69fb0  and 0x7fffa4b69f40
CALL TYPE: call   ERROR SIGNALED: yes   COMPONENT: (null)

   
At this point in time, we did now knew what signaled the error inside of our code, only that an internal oracle function tried to Trim_All, but trimming what? 
While testing, we've found out that a combination of IN/OUT parameters together with a loop were responsible for triggering this error. But wrapping the function with additional local variable declaration, the error wasn't raised anymore. Wrapping the function is, unfortunately, ok in a sql session, but not in the pl/sql code of our application. 

So this code raised an error while other, rewritten code, didn't:
DECLARE
  p_ifi_id NUMBER := 12345;
BEGIN
  FOR i IN 1 .. 10 LOOP
    DECLARE
      ld_startzeitstempel DATE;
      lt_val              pkg_zr_def.t_table_of_value;
      lt_sta              pkg_zr_def.t_table_of_stati;

-- TYPE t_table_of_stati is TABLE OF VARCHAR2(10) INDEX BY BINARY_INTEGER;       lt_sta2             pkg_zr_def.t_table_of_stati;
      lt_sta2_flag        pkg_zr_def.t_table_of_flags;
      lt_zeitstempel      pkg_import_lg.t_table_of_string; -- TYPE t_table_of_string is TABLE OF VARCHAR2(20) INDEX BY BINARY_INTEGER;
 
      lt_dstg             pkg_import_lg.t_table_of_string;
      lc_zpt_id           VARCHAR2(80);
      lc_linie            VARCHAR2(80);
      ld_ghi_von          DATE;
      ld_ghi_bis          DATE;
      ln_kan_id           NUMBER;
      ln_ghi_id           NUMBER;
      ln_zrd_id           NUMBER;
      l_index          PLS_INTEGER;
      l_sta_orig       VARCHAR2(10);
      l_act_state_list VARCHAR2(1000);
    BEGIN
      pkg_import_lg.prc_get_lastprofil

                                      (p_ifi_id      => p_ifi_id,
                                       p_rec_id      => i,
                                       p_putarr_von  => ld_startzeitstempel,
                                       p_val         => lt_val,
                                       p_sta         => lt_sta,
                                       p_sta2        => lt_sta2,
                                       p_sta2_flag   => lt_sta2_flag,
                                       p_zeitstempel => lt_zeitstempel,
                                       p_dstg        => lt_dstg,
                                       p_zpt_id      => lc_zpt_id,
                                       p_linie       => lc_linie,
                                       p_ghi_von     => ld_ghi_von,
                                       p_ghi_bis     => ld_ghi_bis,
                                       p_kan_id      => ln_kan_id,
                                       p_ghi_id      => ln_ghi_id,
                                       p_zrd_id      => ln_zrd_id);
    END;
  END LOOP;
END;


Moving on with the search, I have found a little nice sentence at support.oracle.com:
Bug 6085136 : ORA-00600 [PMUS.C#103] WHEN SUBMITTING ORACLE APPS TIMECARD. 

At the first sight, this is an Oracle Application error that should have nothing to do with our problem, but read by yourself:
"pmus.c is for handling plsql collections.  At the point it fails it seems to be setting a plsql table variable to null. Part of this involves trimming all elements from the collection and it is during the trim it falls over." 

Didn't we have found this pmusmal_Trim_All()+821 at the trace file? Looking for the call stack I've found this: 
----- Abridged Call Stack Trace -----
ksedsts()+244<-kjzdssdmp()+321<-kjzduptcctx()+692<-kjzdpcrshnfy()+347<-kstdmp()+352<-dbkedDefDump()+8728<-ksedmp()+261<-ksfdmp()+89<-dbgexPhaseII()+1845<-dbgexProcessError()+2539<-dbgeExecuteForError()+110<-dbgePostErrorKGE()+1870<-dbkePostKGE_kgsf()+90<-kgeadse()+429
<-kgerinv_internal()+49<-kgerinv()+40<-kgeasnmierr()+150<-pmusmal_Trim_All()+821<-pmucstrmall()+46<-pevm_SETN()+482
----- End of Abridged Call Stack Trace ----- 


PMUCSTRMALL? Doesn't sound this like a function for handling "strings"? And we do have some table of strings, working as an IN OUT parameter to our procedure.  Then we've found the second bug note with a really nice hint for us: Bug 24421301 : QP_MODIFIERS_PUB ORA-600 [PMUS.C#103] CALLING PACKAGE tells us
"From the tracefile ebsmock51_ora_10693_i312469.trc and in conjunction with the plsql code, plsql callstack, I can see that - an attempt was made to (re-)initialize the package globals ( records, collections ) in PACKAGE QP_Modifiers_PUB via BSETN , and that's when the internal error occurred, while processing the null initialization of a certain collection in that package."

And again, this sounded like it was the same as our problem (global initialization of a package, remote declaration of an array,...), and yes, we do only have the problem when we are running through a loop. We don't have it, if we use local variables and different calls to this procedure. 
Time to hand this over to Oracle in our SR, you think? Well, I did had a different view on that (I don't like to be streamlined, you know) - and I thought: There MUST be a usable workaround and I WANT to find it together with our developer


As a summary: It must have to do with arrays of string, it must have to do something with initialization of these arrays while initializing a package global and it only happens, if you loop, not if you call the procedure for the first time.

Next, I'd suggested something really nasty: Why don't do something like "initialize" our pl/sql tables of strings? Or just deleteing the NEW declared tables of strings??? So we've tried to add the deletes and we 've put a dbms_output into the code to find out, if the variables are really empty at runtime. 

Take a look at the following code (changes to the upper code are marked green):
DECLARE
  p_ifi_id NUMBER := 12345;
BEGIN
  FOR i IN 1 .. 10 LOOP
    DECLARE
      ld_startzeitstempel DATE;
      lt_val              pkg_zr_def.t_table_of_value;
      lt_sta              pkg_zr_def.t_table_of_stati;
      lt_sta2             pkg_zr_def.t_table_of_stati;
      lt_sta2_flag        pkg_zr_def.t_table_of_flags;
      lt_zeitstempel      pkg_import_lg.t_table_of_string;
      lt_dstg             pkg_import_lg.t_table_of_string;
      lc_zpt_id           VARCHAR2(80);
      lc_linie            VARCHAR2(80);
      ld_ghi_von          DATE;
      ld_ghi_bis          DATE;
      ln_kan_id           NUMBER;
      ln_ghi_id           NUMBER;
      ln_zrd_id           NUMBER;
      l_index          PLS_INTEGER;
      l_sta_orig       VARCHAR2(10);
      l_act_state_list VARCHAR2(1000);
    BEGIN

      dbms_output.put_line(lt_zeitstempel.count || ' xxx ' || lt_dstg.count);
      lt_zeitstempel.delete;
      lt_dstg.delete;
      pkg_import_lg.prc_get_lastprofil

                                      (p_ifi_id      => p_ifi_id,
                                       p_rec_id      => i,
                                       p_putarr_von  => ld_startzeitstempel,
                                       p_val         => lt_val,
                                       p_sta         => lt_sta,
                                       p_sta2        => lt_sta2,
                                       p_sta2_flag   => lt_sta2_flag,
                                       p_zeitstempel => lt_zeitstempel,
                                       p_dstg        => lt_dstg,
                                       p_zpt_id      => lc_zpt_id,
                                       p_linie       => lc_linie,
                                       p_ghi_von     => ld_ghi_von,
                                       p_ghi_bis     => ld_ghi_bis,
                                       p_kan_id      => ln_kan_id,
                                       p_ghi_id      => ln_ghi_id,
                                       p_zrd_id      => ln_zrd_id);
    END;
  END LOOP;
END;


The output of the dbms_output was 0xxx0 - this is, what it should be. The variable is declared for EVERY loop again, so it's a new, empty, fresh variable that can NEVER hold any values at the time the dbms_ouput is executed! But - with our deletes, the ORA-00600 wasn't raised again!!
So we made a "better quality empty new created" variable out of an "empty new created" variable and that heals the Oracle bug??? Yes... Strange, isn't it??? And this happend only with VARCHAR2(20) and not with VARCHAR2(10) arrays??? Indeed... 

Now we have at least our workaround while my colleagues try to figure out, if patching the database with a newer PSU will fix that problem and they have time to work together with Oracle support to get a solution working for all databases.
Nevertheless, this shows me again - it's worth to try things you can call stupid, dump or nasty if you run in an Oracle database bug.

The only thing that counts at the end is: The customer can run his daily business without being interrupted by an internal Oracle database error.

----- Update December 2017 -----

The problem is now filed as an official bug at support.oracle.com:

Bug 27242226 : ORA-600[PMUS.C#103] WHILE CREATING/EXECUTING PACKAGE IN SINGLE BYTE CHARACTERSET

The bug was invented with 12.1.0.1 (error does not happen in 11.2.0.4) and is still open at the actual version (12.2.0.1) with the latest PSUs.

Update (19.09.2018) - A fix is available, have a look at this new post for details.

Autum can be nice

I like the autumn, if it is sunny and warm outside. We do often have this kind of weather here at Freiburg (Breisgau) and at the Kaiserstuhl, a small, formerly volcanic area nearby. And for me as a photographer the number of colours are overwhelming...

Kaiserstuhl