XREF stops with error when scanning orphans that reference an invalid logical file name

Environment

linux 2.6.32-431.29.2.el6.x86_64

Description

We've been ungrading our dev thor system, and now we see Xref issues. This has been discussed internally, so apologies if I've missed something.

we have tried the following

You need to use daliadmin to delete them + manually expand out the name to the internal representation and delete.

for : "key::salt_coop::rid::meow::father::", it's not clear what the trailing component is, I'm not sure if it's a Scope or a File, or a SuperFile

The internal xpath those would be:

For Scope : /Files/Super[@name="key"]/Scope[@name="salt_coop"]/Scope[@name="rid"]/Scope[@name="meow"]/Scope[@name="father"]/Scope[@name=""]
For File : /Files/Super[@name="key"]/Scope[@name="salt_coop"]/Scope[@name="rid"]/Scope[@name="meow"]/Scope[@name="father"]/File[@name=""]
For SuperFile : /Files/Super[@name="key"]/Scope[@name="salt_coop"]/Scope[@name="rid"]/Scope[@name="meow"]/Scope[@name="father"]/SuperFile[@name=""]

You would remove with, e.g.:
daliadmin <daliip> delete '/Files/Super[@name="key"]/Scope[@name="salt_coop"]/Scope[@name="rid"]/Scope[@name="meow"]/Scope[@name="father"]/Scope[@name=""]'

with the following results

./daliadmin 10.193.64.11 delete '/Files/Super[@name="key"]/Scope[@name="salt_coop"]/Scope[@name="rid"]/Scope[@name="meow"]/Scope[@name="father"]/SuperFile[@name=""]'
ERROR: Could not connect to /Files/Super[@name="key"]/Scope[@name="salt_coop"]/Scope[@name="rid"]/Scope[@name="meow"]/Scope[@name="father"]/SuperFile[@name=""]
Complete at 10:49:02
[root@p-tdalidev01 bin]$ ./daliadmin 10.193.64.11 delete '/Files/Super[@name="key"]/Scope[@name="salt_coop"]'
ERROR: Could not connect to /Files/Super[@name="key"]/Scope[@name="salt_coop"]

most of the time nothing seen in the logs, but we did see once

00002D78 2015-10-20 07:06:54.394 30559 30569 "ERROR: /var/lib/jenkins/workspace/LN-Candidate-withplugins-5.4.2-1/LN/centos-6.4-x86_64/HPCC-Platform/dali/sasha/saxref.cpp(2217) : XREF: : Scope is blank in file name 'key::salt_coop::rid::meow::father::'"

we do see errors in the sasha logs, but they dont seem to be relavent ?

000021B5 2015-11-16 03:27:48.657 113153 113166 "FILEEXPIRY: Deleting scrub::cuem::c100280001::i000240732_b151397001_eexperian_
20150519_030002.txt_postpreprocessor_w20150519-045354_builddata"
000021B6 2015-11-16 03:27:48.730 113153 113166 "ERROR: scrub::cuem::c100280001::i000240732_b151397001_eexperian_20150519_03000
2.txt_postpreprocessor_w20150519-045354_builddata - cause: [ -1: Can't remove scrub::cuem::c100280001::i000240732_b151397001_e
experian_20150519_030002.txt_postpreprocessor_w20150519-045354_builddata: Cannot remove file scrub::cuem::c100280001::i0002407
32_b151397001_eexperian_20150519_030002.txt_postpreprocessor_w20150519-045354_builddata as owned by SuperFile(s): scrub_input:
:cuem::w20150519-110533] "
000021B7 2015-11-16 03:27:48.730 113153 113166 "ERROR: 12: /var/lib/jenkins/workspace/LN-Candidate-withplugins-5.4.2-1/LN/cent
os-6.4-x86_64/HPCC-Platform/dali/sasha/saxref.cpp(2352) : FILEEXPIRY: remove : DFS Exception: 12: Failed to delete file: scrub
::cuem::c100280001::i000240732_b151397001_eexperian_20150519_030002.txt_postpreprocessor_w20150519-045354_builddata - cause: [
-1: Can't remove

Conclusion

None

Activity

Show:

James Wilson January 22, 2016 at 10:37 AM

Confirmed, we can run XRef, and have just deleted 2.3TB of orphan files that have built up in the last 4 months!

Sanjeev Sriavsatav January 22, 2016 at 10:33 AM

James- Could you please confirm; if you are able to process xref files now.

Jacob Cobbett-Smith January 21, 2016 at 12:09 PM
Edited

Marking as compatibility=point , because it could go in next point build as minor fix that only effects progress of XREF.
But there are workarounds so if no next point release on current minor, then could go in next minor build, i.e. 5.6.0

Jacob Cobbett-Smith January 21, 2016 at 12:05 PM

I'm going to accept this JIRA and issue a pull request.
I can see one place where XREF is intolerant to badly formed names, which I think were the cause of all the issues here.
i.e. logical names deduced from physical parts.

There was a change sometime ago that prevented it stopping when hitting a problems with meta data for a single logical file (https://hpccsystems.atlassian.net/browse/HPCC-14397#icft=HPCC-14397).
But this case, where deducing a logical file name from physicals to check orphans was missed.

James Wilson January 21, 2016 at 11:58 AM

Hurrah! Thanks for your help Jake. I'm just checking thor04_20way, but we know what to do if it doesn't work.

Fixed
Pinned fields
Click on the next to a field label to start pinning.

Details

Components

Assignee

Reporter

Priority

Compatibility

Point

Fix versions

Affects versions

Created January 20, 2016 at 12:25 PM
Updated January 25, 2016 at 10:07 AM
Resolved January 25, 2016 at 10:07 AM

Flag notifications