Skip to content

List of important messages#

Here we collect the error and fatal messages which have appeared in the recent past and put some instructions what must be done in case one of these messages is observed during your shift. You should have set the appropriate filter settings as described in here which will filster out messages which should be ignored by the shifter.

In case an error or fatal message appears which is not described on this page despite applying the appropriate filter settings please always create a bookkeeping entry to the run as described here.

Hint:

Magic numbers in the message body are of course not fixed. So do not search for the whole message, but for the words which do not change instead. E.g. the message DPL: Splitting a huge cluster: chipID 233, rows 302:318 cols 468:596 can of course come from different chipIDs and different rows and columns. So rather search for DPL: Splitting a huge cluster.

TfBuilder error#

Severity System Facility Message
Error datadist/tfbuilder Merging STFs error: STF first orbits do not match [...]

STFs from different detectors are out of sync. Something is wrong with the data sent from FLPs. This can yield problems reconstructing the run. Report to FLP oncall.

Framework / DPL#

Severity System Facility Message
Error DPL can be any Dropping incomplete DET/DATA/INDEX Lifetime::0 data in slot 0 with timestamp X < Y as it can never be completed.
Error DPL can be any We are trying to send an oldest possible timeslice X that is older than the last one we already sent Y
Error DPL readout-proxy Could not find management segment for shmid '4cc14c93'
Error DPL can be any Missing DATA/DESCRIPTION (lifetime:condition or lifetime:timeframe) while dropping incomplete data in slot X with timestamp Y < Y+1.

All above errors should be ignored at the moment. They are being investigated by experts.

DataDistribution#

Severity System Facility Message
Error datadist/tfscheduler addStfInfo: this TimeFrame was already built [...]

These errors can be ignored at the moment, they are being investigated by the experts. With the default filter settings you should not even see them.

ROOT#

Severity System Facility Message
Error STDERR can be any stderr: Error in %: VariableMetricBuilder Initial matrix not pos.def.

These are messages from ROOT put to stderr which get pickep up by the STDERR monitor tool. Please ignore these messages for now.

QC#

Severity System Facility Message
Error QC % Could not register to ServiceDiscovery.

Please ignore this type of error message for now.

Severity System Facility Message
Error QC post/% An error occurred during retrieval
Error DPL qc-% Requested resource does not exist: ali-qcdb.cern.ch:8083/qc/%

Please check first if the detector the error is connected to has put some information to their known issues page mentioning the error. If not, please create a bookkeeping entry tagging in addition to QC/PDP shifter the detector linked to the error message (possibly you have to enable the Detector column in the InfoLogger to see it) and copy the full line from the IL (at least time, hostname, system, facility, detector, partition, run number). Only the Requested resource does not exist message is relevant. The "An error occured during retrieval" simply follows the other message but does not add any helpful information for the expert.

Please do not tag PDP for your bookkeeping entry related to missing objects for QC tasks, only QC/PDP shifter and the detector which is associated to the message.

MFT#

Severity System Facility Message
Error DPL mft-stf-decoder link cruID:0x0808/lID10 feeID:0x0123 (cable 17) has IR=BCid: 594 Orbit: 67398822 for current majority IR=BCid: 594 Orbit: 69495997 -> Old ROF, discarding

Please create bookkeeping entry tagging only MFT (and not PDP).

MID#

Severity System Facility Message
Error DPL MIDRawDecoder RAWPARSER: Incomplete HBF - jump in packet counter 255 to 1 (1 total RawParser errors)

These errors can be ignored at the moment, they are being investigated by the experts.

EMCAL#

Severity System Facility ErrSource Message
Error DPL EMCALRawToCellConverterSpec AltroDecoder.cxx Error while decoding RCU trailer: Last RCU trailer word not found!

Please create a bookkeeping entry tagging in addition to "QC/PDP Shifter" also "EMCAL" and "PDP". In your entry you should copy+paste the full line of the InfoLogger including at least run number and hostname. In case other issues appear simultaneously, for example EMCAL QC plots are bad, then call EMCAL oncall.

Severity System Facility Message
Error DPL EMCALRawToCellConverterSpec Not all EMC active links contributed in global BCid=261923256227: mask=0000000000000000000000000000001000000000000000

Please ignore the "Not all EMC active links contributed in" error for now.

PHOS#

Severity System Facility ErrSource Message
Error DPL PHOSRawToCellConverterSpec RawReaderMemory.cxx Trailer decoding error: Last RCU trailer word not found!

Please ignore these errors at the moment. Message from expert: " old error of PHOS SRI firmware. In rare cases SRU does not finish the event correctly, this the RCU trailer is missing. To fix it we need the new SRU firmware which will not happen in nearest future".

MCH#

Severity System Facility Message
Error QC post/% An error occurred during retrieval
Error DPL qc-pp-% Requested resource does not exist: ali-qcdb.cern.ch:8083/qc/MCH/MO/%

These errors appear due to a misconfiguration of the MCH QC json which they need to fix. To be ignored by the shifters for now.

CCDB upload problems#

Severity System Facility ErrSource Message
Error DPL ccdb-populator CCDBPopulatorSpec.h failed on uploading to http://ccdb-test.cern.ch:8080 / FT0/Calib/TimeSpectraInfo for [1689213645332:1691805955826]

In case this happens during a SYNTHETIC run, meaning the upload fails to http://ccdb-test.cern.ch:8080 (see message text) then create a bookkeeping entry tagging CCDB. If this happens during a PHYSICS run please call the PDP oncall.

CCDB access problems#

Severity System Facility ErrSource Message
Error DPL dpl/ft0-reconstructor CcdbApi.cxx DPL: Requested resource does not exist: http://o2-ccdb.internal//download/83b37330-28ff-11ec-ab62-2a010e0a09fb
Error DPL dpl/ft0-reconstructor CcdbApi.cxx DPL: Curl request to http://o2-ccdb.internal//FT0/Calibration/ChannelTimeOffset/1635477272416/ failed

If the run can continue, create a bookkeeping entry for the run tagging PDP and CCDB and including the full message. If not, i.e. too many EPNs crash and the run needs to be stopped because of this, call PDP oncall.

GPU reconstruciton issues#

Severity System Facility ErrSource Message
Error DPL dpl/gpu-reconstruction GPUChainTracking.cxx DPL: GPUReconstruction suffered from an error in the CPU part
Error DPL dpl/gpu-reconstruction GPUErrors.cxx DPL: GPU Error Code (0:20) ERROR_CF_PEAK_OVERFLOW : 282017 / 271507 / 0

Create bookkeeping entry tagging PDP. As always, copy+paste the full line from the InfoLogger. At least run number and full error message must be included.

Error messages at EOR#

Severity System Facility Message
Error DPL % Some Lifetime::Timeframe data got dropped starting at X

Runs cannot be stopped cleanly at the moment. Therefore there will always be a flood of errors at EOR in the InfoLogger which should be ignored. The above message may also appear at SOR or during a run. At the moment please ignore this. It is under expert investigation.

Severity System Facility Message
Error datadist/tfscheduler [FMQ] Uncaught exception reached the top of DeviceRunner: Invalid transition: END: failed to change state of a fairmq device

The above message might appear in case a run is not stopped cleanly. Can be ignored at EOR. If it appears in the middle of an ongoing run please contact EPN team.

Messages in case processes crash#

If a process crashes on the EPNs this can have different reasons (segmentation violation, OOM killer, unexpected input data, ...) which is not always obvious to spot from the message. Each crash will lead to an error or fatal message in the InfoLogger and whenever you spot such a message you should create a bookkeeping entry with the relevant information for the experts. A template entry is shown here.