Derek Weitzel: April 2016

Thursday, April 28, 2016

Moving to a new Blogging Website

I wanted to write a quick note. I am moving to a new blogging platform at https://djw8605.github.io/. Please update any RSS or Atom feeds you may have.

Wednesday, April 20, 2016

Querying an Elasticsearch Cluster for Gratia Records

For the last few days I have been working on email reports for GRACC, OSG's new prototype accounting system. The source of the email reports are located on Github.

I have learned a significant amount about queries and aggregations for ElasticSearch. For example, below is the query that counts the number of records for a date range.

The above query searches for queries in the date range specific, and counts the number of records. It uses the Elasticsearch-dsl python library. It does not return the actual records, just a number. This is useful for generating raw counts and a delta for records processed over the last few days.

The other query I designed is to aggregate the number of records per probe. This query is designed to help us understand differences in specific probe's reporting behavior.

This query is much more complicated than the simple count query above. First, it creates a search selecting the "gracc-osg-*" indexes. It also creates an aggregation "A" which will be used later to aggregate by the ProbeName field.

Next, we create a bucket called day_range which is of type range. It aggregates in two ranges, the last 24 hours and the 24 hours previous to that. Next, we attach our ProbeName aggregation "A" defined above. In return we get an aggregation for each of the ranges, for each of the probes, how many records exist for that probe.

This nested aggregation is a powerful feature that will be used in the summarization of the records.