I am new to Elasticsearch and am trying to do a basic aggregation.
Background: we are running Kubernetes with the default ELK stack from the kubernetes reporsitory. Inside the cluster we have nginx and these logs get to elasticsearch in this format:
{
"_index": "logstash-2017.01.19",
"_type": "fluentd",
"_id": "AVm5LG7nhh_AdXAcBz7o",
"_score": null,
"_source": {
"log": "10.10.82.1 - - - app.example.com [19/Jan/2017:23:59:59 +0000] \"GET ///latest/backend/call.php?callback=jQuery111308542505159888693_1484870408978&_cba=pageview&_cbv=bbb4deaebbdfb32e1554b2ee3925558e960921d3&_cbb=&_cbs=&_cbapu=https%3A%2F%2Fwww.somewebsite.com%2F%3Fvcp%3Dd371618ae93f99%26refPa%3D1%26refID%3DExample_DE%2FAffilinet%2FNV%2FBanner%2FLogo%26emsrc%3DAffiliate%26pid%3D290476%26affmt%3D2%26affmn%3D1&_cbp=&_cbh=www.somewebsite.com&_cbsh=57ce86ee7516a2.10364792&_cbtt=&_cbr=http%3A%2F%2Fapp.web.com%2Fclick3.aspx%3Fref%3D290476%26site%3D3901%26type%3Dtext%26tnb%3D1&_cbl=https%3A%2F%2Fwww.somewebsite.com%2F%3Fvcp%3Dd371618ae93f99%26refPa%3D1%26refID%3DExample_DE%2FAffilinet%2FNV%2FBanner%2FLogo%26emsrc%3DAffiliate%26pid%3D290476%26affmt%3D2%26affmn%3D1&_cbpl=allowTracking&_=1484870408979 HTTP/2.0\" 10.0.82.176:9000 upstream_response_time 0.054 msec 1484870399.081 request_time 0.054200 276 \"-\" Mozilla/5.0 (Windows NT 6.1; WOW64; rv:50.0) Gecko/20100101 Firefox/50.0 -\n",
"stream": "stdout",
"docker": {
"container_id": "8ad096e55d76193c8aa6a739f351af04f5f1f1bb965ce30ba4e862c3743d034d"
},
"kubernetes": {
"namespace_name": "default",
"pod_name": "nginx-edge-0dba94aa01a7adee797c844458cda3e2-fbidb",
"container_name": "nginxedge"
},
"tag": "kubernetes.var.log.containers.nginx-edge-0dba94aa01a7adee797c844458cda3e2-fbidb_default_nginxedge-8ad096e55d76193c8aa6a739f351af04f5f1f1bb965ce30ba4e862c3743d034d.log",
"@timestamp": "2017-01-19T23:59:59+00:00"
},
"fields": {
"@timestamp": [
1484870399000
]
},
"highlight": {
"log": [
"?callback=jQuery111308542505159888693_1484870408978&@kibana-highlighted-field@_cba@/kibana-highlighted-field@=@kibana-highlighted-field@pageview@/kibana-highlighted-field@&_cbv"
]
},
"sort": [
1484870399000
]
}
n.b. copied from kibana, so I'm not sure there are some fields in there that kibana requested.
I would like to count unique 'hits' of urls with a certain string in the query string.
in pseudo code the result should look like this:
hits: {
'_cbh=www.example.com': 100,
'_cbh=www.example2.com': 50,
'_cbh=www.example3.com': 90
}
Reading the documents and trying to replicate examples I run into '[FIELDDATA] Data too large'.