Thursday, April 28, 2016
Moving to a new Blogging Website
I wanted to write a quick note. I am moving to a new blogging platform at https://djw8605.github.io/. Please update any RSS or Atom feeds you may have.
Wednesday, April 20, 2016
Querying an Elasticsearch Cluster for Gratia Records
For the last few days I have been working on email reports for GRACC, OSG's new prototype accounting system. The source of the email reports are located on Github.
I have learned a significant amount about queries and aggregations for ElasticSearch. For example, below is the query that counts the number of records for a date range.
The above query searches for queries in the date range specific, and counts the number of records. It uses the Elasticsearch-dsl python library. It does not return the actual records, just a number. This is useful for generating raw counts and a delta for records processed over the last few days.
The other query I designed is to aggregate the number of records per probe. This query is designed to help us understand differences in specific probe's reporting behavior.
This query is much more complicated than the simple count query above. First, it creates a search selecting the "gracc-osg-*" indexes. It also creates an aggregation "A" which will be used later to aggregate by the ProbeName field.
Next, we create a bucket called day_range which is of type range. It aggregates in two ranges, the last 24 hours and the 24 hours previous to that. Next, we attach our ProbeName aggregation "A" defined above. In return we get an aggregation for each of the ranges, for each of the probes, how many records exist for that probe.
This nested aggregation is a powerful feature that will be used in the summarization of the records.
I have learned a significant amount about queries and aggregations for ElasticSearch. For example, below is the query that counts the number of records for a date range.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
def GetCountRecords(client, from_date, to_date, query = None): | |
""" | |
Get the number of records (documents) from a date range | |
""" | |
s = Search(using=client, index='gracc-osg-*') \ | |
.filter('range', **{'@timestamp': {'from': from_date, 'to': to_date}}) \ | |
.params(search_type="count") | |
response = s.execute() | |
return response.hits.total |
The other query I designed is to aggregate the number of records per probe. This query is designed to help us understand differences in specific probe's reporting behavior.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Create the search and aggreagations (A) | |
s = Search(using=es, index='gracc-osg-*') | |
a = A('terms', field='ProbeName', size=0) | |
s.aggs.bucket('day_range', 'range', field='@timestamp', | |
ranges = [ | |
{'from': 'now-1d', 'to': 'now'}, | |
{'from': 'now-2d', 'to': 'now-1d'} | |
]) \ | |
.bucket('probenames', a) | |
response = s.execute() |
Next, we create a bucket called day_range which is of type range. It aggregates in two ranges, the last 24 hours and the 24 hours previous to that. Next, we attach our ProbeName aggregation "A" defined above. In return we get an aggregation for each of the ranges, for each of the probes, how many records exist for that probe.
This nested aggregation is a powerful feature that will be used in the summarization of the records.
Subscribe to:
Posts (Atom)