Investigate cost/speed of default use of WUQueryDetails

Description

A number of environments and jobs use esp services from SOAPCALLS to manage the validity and sometimes deployment of queries to Roxie.

For example RAMPS uses WUQueryDetails in order to obtain the workunit a deployed query is based upon, and in a separate Esp service call, discover if there is a more current compiled version of that query.

WUQueryDetails is an expensive request and does a lot more than just retrieve the workunit id, e.g. it also collates all files and looks them in Dali (with the associated locking that entails).

Queries performing multiple WUQueryDetails (and other similar esp services) are therefore costly to themselves, to Dali and can have an impact on the performance of the whole environment.

Therefore it is important that esp services called from queries like this are efficient and perform the minimal requirements necessary, either by allowing additional options to prevent them doing unwanted tasks, or by created new service calls which are designed specifically to perform the task required.

In the RAMPS case, I think what is really wanted is a single call that asks whether a deployed query is up to date , and that it checks the query wuid against any newer compiled query.

Conclusion

None

Attachments

1

Activity

Show:

Kanghua Wang June 22, 2020 at 7:27 PM

Ok, I will try the fix.

Jacob Cobbett-Smith June 22, 2020 at 7:19 PM

WUListQueries or WUQueryDetails?
And if WUListQueries, there's a separate JIRA and fix: https://hpccsystems.atlassian.net/browse/HPCC-24273

Kanghua Wang June 22, 2020 at 7:17 PM

In my test, esp took 158.6 seconds to do a WUListQueries (>18506 queires). The post filter part took 158.3 seconds.

Drea Leed June 22, 2020 at 1:56 PM
Edited

Jake, in production, this is called about every 5 minutes by the workunit(s) running through, and I believe there's also an external healthcheck that is pretty constantly calling it to check the status of a few roxie queries. We had to disable a separate roxie monitor system because its pings to WUQueryDetails every 30 seconds to check on a single workunit's status were killing production.

 

AFAIK only WUListQueries and WUQueryDetails are the esp pain points still remaining.

Kanghua Wang June 21, 2020 at 10:03 PM

I run WUQueryDetails for batchr3dspsvc_mbs_hcfaciltoppaidnetworksv5.ins5202965_service_307264.1 using the existing default options twice.
The 1st time: the whole process took 129 seconds. The getQueryRegistry() took 74 seconds. The getQueryFiles() (which calls the createPackageSet()) took 54 seconds.
The 2nd time (more log lines in the getQueryFiles), the whole process took 53 seconds. The getQueryRegistry() took 0.2 seconds. The createPackageSet() took 53 seconds.

1st time:
00000362 PRG 2020-06-21 16:55:54.431 13382 13718 "Enter onWUQueryDetails"
00000363 PRG 2020-06-21 16:55:54.432 13382 13718 "Before getQueryRegistry tree"
...
0000036F PRG 2020-06-21 16:57:08.818 13382 13718 "SDSManager::connect(): xpath</QuerySets/QuerySet[@id="roxie"]>, RemoteConnection ID<211e66b2f2>, mode<2>, timeout<300000>"
00000370 PRG 2020-06-21 16:57:08.818 13382 13718 "After getQueryRegistry tree"
00000371 PRG 2020-06-21 16:57:09.046 13382 13718 "After resolveQueryAlias tree"
00000372 USR 2020-06-21 16:57:09.046 13382 13718 "WUQueryDetails: QuerySet roxie, query batchr3dspsvc_mbs_hcfaciltoppaidnetworksv5.ins5202965_service_307264.1"
00000373 PRG 2020-06-21 16:57:09.047 13382 13718 "Before openWorkUnit"
00000374 PRG 2020-06-21 16:57:09.049 13382 13718 "SDSManager::connect(): xpath</WorkUnits/W20191031-121447>, RemoteConnection ID<211e66b506>, mode<0>, timeout<300000>"
0000037F PRG 2020-06-21 16:57:09.094 13382 13718 "Before cw->getStatistic"
00000380 PRG 2020-06-21 16:57:09.098 13382 13718 "Before cw->getLibraries"
00000381 PRG 2020-06-21 16:57:09.105 13382 13718 "Before getQueryFiles"
00000382 PRG 2020-06-21 16:57:09.108 13382 13718 "SDSManager::connect(): xpath</WorkUnits/W20191031-121447>, RemoteConnection ID<211e66b509>, mode<0>, timeout<300000>"
...
0000038E PRG 2020-06-21 16:58:03.696 13382 13718 "SDSManager::connect(): xpath</WorkUnits/W20191031-121447>, RemoteConnection ID<211e66b6b6>, mode<0>, timeout<300000>"
...
00000393 PRG 2020-06-21 16:58:03.699 13382 13718 "SDSManager::connect(): xpath</GeneratedDlls/DlllibW20191031_121447_so[@name="libW20191031-121447.so"]>, RemoteConnection ID<211e66b6b9>, mode<0>, timeout<30000>"
...
00000397 PRG 2020-06-21 16:58:03.779 13382 13718 "Loading dll (libW20191031-121447.so) from location /var/lib/HPCCSystems/dllserver/temp/libW20191031-121447.so"
00000398 PRG 2020-06-21 16:58:03.781 13382 13718 "Before doGetTimers"
00000399 PRG 2020-06-21 16:58:03.784 13382 13718 "Before doGetGraphs"
0000039A PRG 2020-06-21 16:58:03.808 13382 13718 "Leave onWUQueryDetails"
0000039B PRG 2020-06-21 16:58:03.871 13382 13718 "TxSummary[activeReqs=1;auth=Ok;contLen=599;rcv=0ms;handleHttp=129458ms;user=kwang@192.168.56.1;req=SOAP wsworkunits.WUQueryDetails v1.77;total=129458ms;]"

2nd time:
0000013A PRG 2020-06-21 17:35:00.184 16970 17178 "Enter onWUQueryDetails"
0000013B PRG 2020-06-21 17:35:00.189 16970 17178 "Before getQueryRegistry tree"
0000013C PRG 2020-06-21 17:35:00.191 16970 17178 "SDSManager::connect(): xpath</QuerySets/QuerySet[@id="roxie"]>, RemoteConnection ID<211e67380a>, mode<2>, timeout<300000>"
0000013D PRG 2020-06-21 17:35:00.192 16970 17178 "After getQueryRegistry tree"
0000013E PRG 2020-06-21 17:35:00.394 16970 17178 "After resolveQueryAlias tree"
0000013F USR 2020-06-21 17:35:00.394 16970 17178 "WUQueryDetails: QuerySet roxie, query batchr3dspsvc_mbs_hcfaciltoppaidnetworksv5.ins5202965_service_307264.1"
00000140 PRG 2020-06-21 17:35:00.395 16970 17178 "Before openWorkUnit"
00000141 PRG 2020-06-21 17:35:00.396 16970 17178 "SDSManager::connect(): xpath</WorkUnits/W20191031-121447>, RemoteConnection ID<211e67380d>, mode<0>, timeout<300000>"
...
0000014B PRG 2020-06-21 17:35:00.444 16970 17178 "Before cw->getStatistic"
0000014C PRG 2020-06-21 17:35:00.447 16970 17178 "Before cw->getLibraries"
0000014D PRG 2020-06-21 17:35:00.449 16970 17178 "Before getQueryFiles"
0000014E PRG 2020-06-21 17:35:00.449 16970 17178 "In getQueryFiles"
0000014F PRG 2020-06-21 17:35:00.449 16970 17178 "After getTargetClusterInfo"
00000150 PRG 2020-06-21 17:35:00.449 16970 17178 "Before getWorkUnitFactory"
00000151 PRG 2020-06-21 17:35:00.449 16970 17178 "After getWorkUnitFactory"
00000152 PRG 2020-06-21 17:35:00.450 16970 17178 "SDSManager::connect(): xpath</WorkUnits/W20191031-121447>, RemoteConnection ID<211e673810>, mode<0>, timeout<300000>"
00000153 PRG 2020-06-21 17:35:00.450 16970 17178 "After openWorkUnit"
00000154 PRG 2020-06-21 17:35:00.451 16970 17178 "SDSManager::connect(): xpath</PackageSets/>, RemoteConnection ID<211e673813>, mode<242>, timeout<300000>"
00000155 PRG 2020-06-21 17:35:00.454 16970 17178 "SDSManager::connect(): xpath</PackageSets/PackageSet[@id="default_roxie1"]>, RemoteConnection ID<211e673816>, mode<1>, timeout<300000>"
00000156 PRG 2020-06-21 17:35:00.458 16970 17178 "SDSManager::connect(): xpath</PackageMaps/PackageMap[@id="roxie::hipie"]>, RemoteConnection ID<211e673819>, mode<1>, timeout<300000>"
00000157 PRG 2020-06-21 17:35:15.765 16970 16977 "SDSManager::connect(): xpath<Sessions/Process[@name="esp"]>, RemoteConnection ID<211e673891>, mode<2>, timeout<30000>"
00000158 PRG 2020-06-21 17:35:45.773 16970 16977 "SDSManager::connect(): xpath<Sessions/Process[@name="esp"]>, RemoteConnection ID<211e67396f>, mode<2>, timeout<30000>"
00000159 PRG 2020-06-21 17:35:45.828 16970 16980 "SDSManager::connect(): xpath</DFU/WorkUnits>, RemoteConnection ID<211e673972>, mode<0>, timeout<30000>"
0000015A PRG 2020-06-21 17:35:46.149 16970 16983 "SDSManager::connect(): xpath</WorkUnits>, RemoteConnection ID<211e673975>, mode<0>, timeout<300000>"
0000015B PRG 2020-06-21 17:35:53.210 16970 17178 "After createPackageSet"
0000015C PRG 2020-06-21 17:35:53.210 16970 17178 "After createReferencedFileList"
0000015D PRG 2020-06-21 17:35:53.285 16970 17178 "After addFilesFromQuery"
0000015E PRG 2020-06-21 17:35:53.285 16970 17178 "After resolveFiles"
0000015F PRG 2020-06-21 17:35:53.285 16970 17178 "Before sort files"
00000160 PRG 2020-06-21 17:35:53.285 16970 17178 "Leave getQueryFiles"
00000161 PRG 2020-06-21 17:35:53.802 16970 17178 "SDSManager::connect(): xpath</WorkUnits/W20191031-121447>, RemoteConnection ID<211e67399e>, mode<0>, timeout<300000>"
00000162 PRG 2020-06-21 17:35:53.802 16970 17178 "CACHE: CPermissionsCache Found validated user kwang"
00000163 PRG 2020-06-21 17:35:53.803 16970 17178 "CACHE: CResPermissionsCache Looking up resource(0 of 1) kwang:ramps_dev_svc"
00000164 PRG 2020-06-21 17:35:53.803 16970 17178 "CACHE: CResPermissionsCache FoundB kwang:ramps_dev_svc=>255"
00000165 PRG 2020-06-21 17:35:53.803 16970 17178 "CACHE: CPermissionsCache Looked up resources for kwang:*, found 1 of 1 matches"
00000166 PRG 2020-06-21 17:35:53.806 16970 17178 "SDSManager::connect(): xpath</GeneratedDlls/DlllibW20191031_121447_so[@name="libW20191031-121447.so"]>, RemoteConnection ID<211e6739a1>, mode<0>, timeout<30000>"
00000167 PRG 2020-06-21 17:35:53.811 16970 17178 "SDSManager::connect(): xpath</GeneratedDlls/DlllibW20191031_121447_so[@name="libW20191031-121447.so"]>, RemoteConnection ID<211e6739a4>, mode<0>, timeout<30000>"
00000168 PRG 2020-06-21 17:35:53.817 16970 17178 "Loading dll (libW20191031-121447.so) from location /var/lib/HPCCSystems/dllserver/temp/libW20191031-121447.so"
00000169 PRG 2020-06-21 17:35:53.819 16970 17178 "Before doGetTimers"
0000016A PRG 2020-06-21 17:35:53.823 16970 17178 "Before doGetGraphs"
0000016B PRG 2020-06-21 17:35:53.845 16970 17178 "Leave onWUQueryDetails"
0000016C PRG 2020-06-21 17:35:53.914 16970 17178 "TxSummary[activeReqs=1;auth=Ok;contLen=599;rcv=617ms;handleHttp=54357ms;user=kwang@192.168.56.1;req=SOAP wsworkunits.WUQueryDetails v1.77;total=54357ms;]"

Fixed
Pinned fields
Click on the next to a field label to start pinning.

Details

Components

Assignee

Reporter

Priority

Fix versions

Created June 18, 2020 at 11:21 AM
Updated July 15, 2020 at 4:13 PM
Resolved July 15, 2020 at 4:13 PM

Flag notifications