Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Welcome to the Red Book for HPCC Systems® 6.0.0. There are several sections in this Red Book as follows:

...

From HPCC Systems® 5.0.0, the new ECL Watch pages are used as the default, so if you are upgrading from a release using the legacy version, take a look at the ECL Watch Transition and Quick Guide to help you bridge the gap.

...

  • To raise an issue, use our Community Issue Tracker. Please create yourself an account if you don't already have one, to get automatic updates as your issue progresses through the workflow.
  • To ask a developer a technical question about something you are doing or have encountered, post in the Developer Forum.To add a note into the RedBook, please contact Lorraine Chapman with full details.

General HPCC Systems Core Platform

...

Note: The Javascript Graph Viewer was introduced as a technical preview in HPCC Systems 5.2.0. You can find out more about it in the blog about new features in HPCC Systems 5.2.0 which illustrates how it will enable us to improve the quality of information we can provide for you about your job via graphs.

...

ds = DATASET(100, TRANSFORM({ unsigned id }, SELF.id := COUNTER));
i := INDEX(ds. { id }, 'myIndex');
BUILD(i);

This is fine when the dataset used to build the index is relatively simple. However there are a couple of downsides:

  • The index can't be logically separated from the dataset to create it from
  • Sometimes the dataset is very complicated (Mb of source). When the index is subsequently used in a query, all the code to create it is also parsed.

From HPCC Systems 6.0.0, we recommend that you use the following new syntax which allows the two to be separated:

ds = DATASET(100, TRANSFORM({ unsigned id }, SELF.id := COUNTER));
i := INDEX({ unsigned id }, 'myIndex');
BUILD(i, ds);

The fields are mapped by name. This form of BUILD also supports the other BUILD options (e.g., new filename, distrbute, etc.)

While this requires you to make some significant changes to your ECL code, it is worth doing to take advantage of the potential benefit of reduced compile times.

https://track.hpccsystems.com/browse/HPCC-8663

REGEXFINDSET for splitting strings

REGEXFINDSET is a new ECL language feature that may used to extract matching patterns from a string or may be used to split a string based on pattern. Previously, you may have worked around how to achieve this using one of the following methods:

  • Std.Str.SplitWords. But only if a single delimiter was enough.
  • Std.Str.SplitWords but pre-process the string using pre-processed with SubstituteOutXXX calls to map all delimiters to one.
  • Roll custom C++. The was error prone and re-invented by every ECL programmer.
  • Roll custom ECL converting the string to a dataset with a record per char, then use rollup/group/dedup operations. Extremely inefficient and impossible to maintain.

REGEXFINDSET provides significant performance benefits over these methods which use multiple parsing and string operations to achieve the same results. Examples of how to use REGEXFINDSET include:

REGEXFINDSET('\w+@[a-zA-Z_]+?\.[a-zA-Z]{2,3}' , str) /* generate a set of all email addresses from str */
REGEXFINDSET('[^.]+', str) /* generate a set of sentences from str */
REGEXFINDSET('[^ |^,|^\r|^\t|^"|^.]+', str) /* generate a set of all words contained in str */

https://track.hpccsystems.com/browse/HPCC-14402

Javascript Graph Control

Due to the phased out support of NPAPI plugins in Chrome, we have introduced a new 100% JavaScript Graph Control. Since this is effectivly a brand new visualization, you can expect to see different layouts and behavior.  However for non-chrome based browsers, there is still a 'legacy' option available.  Please report any unexpected behavior via our Community Issue Tracker.

We have noted the phasing out of NPAPI support in the Red Book for HPCC Systems 5.0.x. You can also find out more about the new 100% Javascript Graph Control from the blog on our website

https://track.hpccsystems.com/browse/HPCC-14573

Explicit control of MulticastTTL setting

A Roxie cluster is often all on a single subnet and does not span across routers.  With a multicast TTL of 1 (the new default) Roxie multicast traffic will not get forwarded beyond its own subnet, potentially saving network bandwidth.  If, for some reason, multicast traffic does need to go across a router to another subnet, this setting can be increased by one for each router to be traversed. If you find you need to change the default setting, go to the RoxieCluster Attributes using the HPCC Systems Configuration Manager and change the value in the multicastTTL option to the number you require for your specific circumstances.

Note: Switch settings (such as enabling IGMP snooping) and vlans can also be used for similar multicast isolation and bandwidth saving.

https://track.hpccsystems.com/browse/HPCC-13451

Distribute pull buffer size is now configurable

Distribute uses a fixed size buffer to receive the distribute rows. When exceeded, it spills to disk by default or optionally via a configuration option blocks (until downstream reader, reads). There is now an option to configure this size as there is for other limits. We recommend using the default setting which is 32MB.

https://track.hpccsystems.com/browse/HPCC-14466

Dali backup queue limits are now configurable

This can now be done using the throttle command from dafscontrol:

Usage : dafscontrol [<dali-ip>] THROTTLE <ip-or-cluster> <class> <limit> <ms-delay> <cpu-limit> <queue-limit>

class
0 = standard
1 = slow
Standard action are things like read,write,size,rename,createdir,setdir, i.e. regular file actions. Slow actions are things like - file copy, getcrc, treecopy

They are divided this way, because they have very different usage patterns and durations. All the other options are configurable per class.

limit (std default 80, slow default 20)

The number of parallel transactions on which dafilesrv should work.

ms-delay (std default 1000, slow default 5000)

The millisecond delay to introduce (maximum) if the limit is exceeded. Any delayed transaction will immediately start (i.e. in < limit) if a slot becomes available.

cpu-limit (std default 85, slow default 75)

The limit only applies if the cpu usage is above these figures.In other words, > limit transactions can run in parallel if cpu is e.g. < 85%

queue-limit (std default 1000, slow default 10000

Upper limit transactions that can be queued pending available slot.

https://track.hpccsystems.com/browse/HPCC-15322

NEW - TRACE activity saves data into a log file with no detectable performance overheads

Tracing is not output by default even if TRACE statements are present. The workunit debug value traceEnabled must be set or the default platform settings changed to always output tracing.

In Roxie you can also request tracing on a deployed query by specifying traceEnabled=1 in the query XML. So you can leave trace statements in the ECL without any detectable overhead until tracing is enabled. Tracing is output to the log file, in the form:

TRACE: value...

The syntax to use is:

myds := TRACE(ds [, traceOptions]);

Available options are:

  • Zero or more expressions, which act as a filter. Only rows matching the filter will be included in the tracing.
  • KEEP (n) indicating how many rows will be traced.
  • SKIP (n) indicating that n rows will be skipped before tracing starts.
  • SAMPLE (n) indicating that only every nth row is traced.
  • NAMED(string) providing the name for the rows in the tracing.

It is also possible to override the default value for KEEP at a global, per-workunit, or per-query level.

#option (traceEnabled, 1) // trace statements will be enabled
#option (traceLimit, 100) // overrides the default keep value (10) for activities that don't specify it. 

Note: You can use a stored boolean as the filter expression for a trace activity, allowing you to turn individual trace activities on and off.

https://track.hpccsystems.com/browse/HPCC-12872

ECL IDE 

...