Track file download activity in JupyterHub?

7/23/2019

I have JupyterHub 0.7.0 deployed to AWS managed kubernetes (EKS)

I need to collect metrics for users that are downloading files from their individual user notebook servers.

Questions:

  1. Are there any logs that are emitted from JupyterHub that will show file download activity? I need to see which user (or which pod) the file download happened from.
  2. Where are these logs and how can I consume them?

Note: By "file download" I am referring to the "Download" button that is available on the Jupyter home page (see the provided screenshot)

enter image description here

-- James Wierzba
amazon-eks
amazon-web-services
jupyter-notebook
kubernetes

2 Answers

7/25/2019

I was able to get the data I need from AWS ELB access logs.

This required me to change a configuration setting. Specifically, the "proxy-public" ELB listener must be listening for HTTP traffic, not TCP traffic. (The "proxy-public" ELB listener is created implicitly with the JupyterHub helm application install)

The ELB access logs have a property for request. For a file download, the request property is formatted like so:

GET https://{DOMAIN}:443/user/{USERNAME}/files/{FILENAME}?download=1 HTTP/1.1

Where DOMAIN is the hosted domain for JupyterHub

Where USERNAME is the JupyterHub user

Where FILENAME is the file that was downloaded

-- James Wierzba
Source: StackOverflow

7/23/2019

You should be able to do that getting the logs from /var/log/jupyterhub.log, and looking for the filename of the file you want to do metrics for.

If you provide the logs I might be able to help further.

But if the needed logs are not there they I think you will need to get those directly from Jupyter users:

docker logs jupyter-<user_name>
-- Crou
Source: StackOverflow