Elasticsearch group by string in url

1/21/2017

I am new to Elasticsearch and am trying to do a basic aggregation.

Background: we are running Kubernetes with the default ELK stack from the kubernetes reporsitory. Inside the cluster we have nginx and these logs get to elasticsearch in this format:

{
  "_index": "logstash-2017.01.19",
  "_type": "fluentd",
  "_id": "AVm5LG7nhh_AdXAcBz7o",
  "_score": null,
  "_source": {
    "log": "10.10.82.1 - - - app.example.com [19/Jan/2017:23:59:59 +0000] \"GET ///latest/backend/call.php?callback=jQuery111308542505159888693_1484870408978&_cba=pageview&_cbv=bbb4deaebbdfb32e1554b2ee3925558e960921d3&_cbb=&_cbs=&_cbapu=https%3A%2F%2Fwww.somewebsite.com%2F%3Fvcp%3Dd371618ae93f99%26refPa%3D1%26refID%3DExample_DE%2FAffilinet%2FNV%2FBanner%2FLogo%26emsrc%3DAffiliate%26pid%3D290476%26affmt%3D2%26affmn%3D1&_cbp=&_cbh=www.somewebsite.com&_cbsh=57ce86ee7516a2.10364792&_cbtt=&_cbr=http%3A%2F%2Fapp.web.com%2Fclick3.aspx%3Fref%3D290476%26site%3D3901%26type%3Dtext%26tnb%3D1&_cbl=https%3A%2F%2Fwww.somewebsite.com%2F%3Fvcp%3Dd371618ae93f99%26refPa%3D1%26refID%3DExample_DE%2FAffilinet%2FNV%2FBanner%2FLogo%26emsrc%3DAffiliate%26pid%3D290476%26affmt%3D2%26affmn%3D1&_cbpl=allowTracking&_=1484870408979 HTTP/2.0\" 10.0.82.176:9000 upstream_response_time 0.054 msec 1484870399.081 request_time 0.054200 276 \"-\" Mozilla/5.0 (Windows NT 6.1; WOW64; rv:50.0) Gecko/20100101 Firefox/50.0 -\n",
    "stream": "stdout",
    "docker": {
      "container_id": "8ad096e55d76193c8aa6a739f351af04f5f1f1bb965ce30ba4e862c3743d034d"
    },
    "kubernetes": {
      "namespace_name": "default",
      "pod_name": "nginx-edge-0dba94aa01a7adee797c844458cda3e2-fbidb",
      "container_name": "nginxedge"
    },
    "tag": "kubernetes.var.log.containers.nginx-edge-0dba94aa01a7adee797c844458cda3e2-fbidb_default_nginxedge-8ad096e55d76193c8aa6a739f351af04f5f1f1bb965ce30ba4e862c3743d034d.log",
    "@timestamp": "2017-01-19T23:59:59+00:00"
  },
  "fields": {
    "@timestamp": [
      1484870399000
    ]
  },
  "highlight": {
    "log": [
      "?callback=jQuery111308542505159888693_1484870408978&@kibana-highlighted-field@_cba@/kibana-highlighted-field@=@kibana-highlighted-field@pageview@/kibana-highlighted-field@&_cbv"
    ]
  },
  "sort": [
    1484870399000
  ]
}

n.b. copied from kibana, so I'm not sure there are some fields in there that kibana requested.

I would like to count unique 'hits' of urls with a certain string in the query string.

in pseudo code the result should look like this:

hits: {
  '_cbh=www.example.com': 100,
  '_cbh=www.example2.com': 50,
  '_cbh=www.example3.com': 90
}

Reading the documents and trying to replicate examples I run into '[FIELDDATA] Data too large'.

-- E_lexy
elasticsearch
kubernetes
nginx

0 Answers