{"id":1649,"date":"2024-10-12T18:20:08","date_gmt":"2024-10-12T16:20:08","guid":{"rendered":"https:\/\/cln.io\/blog\/?p=1649"},"modified":"2024-10-12T21:06:47","modified_gmt":"2024-10-12T19:06:47","slug":"elevating-phishing-defence-with-on-prem-llms","status":"publish","type":"post","link":"https:\/\/cln.io\/blog\/elevating-phishing-defence-with-on-prem-llms\/","title":{"rendered":"CERT-EU Lightning Talk: Elevating phishing defence with On-Prem LLMs"},"content":{"rendered":"\n<h2 class=\"wp-block-heading\" id=\"empowering-soc-analysts-through-ai\"><strong>Empowering SOC Analysts through AI<\/strong><\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">NOTE: This content is from my experiences until October 2024, LLM&#8217;s might evolve and no longer need these tips &amp; tricks.<br>No LLMs were used to generate this blog post or my lighting talk.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"intro\">Intro.<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">I had the opportunity to present a lightning talk at The CERT-EU 2024 conference.<br>As I&#8217;m a cyber engineer and not a data scientist, I have to stick to basic things and basic concepts, so I wanted to present something that SOCs\/organizations might want to give a try when working with on-premise LLMs so I presented these 2 concepts:<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Benchmarking (Models + Prompts)<\/li>\n\n\n\n<li>Micro agents <\/li>\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\"><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Table of content<\/p>\n\n\n\n<nav aria-label=\"Table of Contents\" class=\"wp-block-table-of-contents\"><ol><li><a class=\"wp-block-table-of-contents__entry\" href=\"https:\/\/cln.io\/blog\/elevating-phishing-defence-with-on-prem-llms\/#empowering-soc-analysts-through-ai\">Empowering SOC Analysts through AI<\/a><\/li><li><a class=\"wp-block-table-of-contents__entry\" href=\"https:\/\/cln.io\/blog\/elevating-phishing-defence-with-on-prem-llms\/#intro\">Intro.<\/a><\/li><li><a class=\"wp-block-table-of-contents__entry\" href=\"https:\/\/cln.io\/blog\/elevating-phishing-defence-with-on-prem-llms\/#concept-1-benchmarking\">Concept 1: Benchmarking<\/a><ol><li><a class=\"wp-block-table-of-contents__entry\" href=\"https:\/\/cln.io\/blog\/elevating-phishing-defence-with-on-prem-llms\/#benchmarking-models\">Benchmarking models<\/a><\/li><li><a class=\"wp-block-table-of-contents__entry\" href=\"https:\/\/cln.io\/blog\/elevating-phishing-defence-with-on-prem-llms\/#benchmarking-prompts\">Benchmarking prompts<\/a><\/li><\/ol><\/li><li><a class=\"wp-block-table-of-contents__entry\" href=\"https:\/\/cln.io\/blog\/elevating-phishing-defence-with-on-prem-llms\/#concept-2-micro-agents\"> Concept 2: Micro agents<\/a><ol><li><a class=\"wp-block-table-of-contents__entry\" href=\"https:\/\/cln.io\/blog\/elevating-phishing-defence-with-on-prem-llms\/#the-evaluation-micro-agent\">The evaluation micro agent<\/a><\/li><li><a class=\"wp-block-table-of-contents__entry\" href=\"https:\/\/cln.io\/blog\/elevating-phishing-defence-with-on-prem-llms\/#the-summarizing-microagent\">The summarizing microagent<\/a><\/li><li><a class=\"wp-block-table-of-contents__entry\" href=\"https:\/\/cln.io\/blog\/elevating-phishing-defence-with-on-prem-llms\/#stacking-combining-them\">Stacking\/combining them<\/a><\/li><\/ol><\/li><li><a class=\"wp-block-table-of-contents__entry\" href=\"https:\/\/cln.io\/blog\/elevating-phishing-defence-with-on-prem-llms\/#things-that-work-for-us\">Things that work for us<\/a><\/li><li><a class=\"wp-block-table-of-contents__entry\" href=\"https:\/\/cln.io\/blog\/elevating-phishing-defence-with-on-prem-llms\/#what-s-next\">What&#8217;s next?<\/a><\/li><\/ol><\/nav>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"concept-1-benchmarking\">Concept 1: Benchmarking<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"benchmarking-models\">Benchmarking models<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Every other week a new LLM model is published, it&#8217;s impossible for me to keep up with how well its performing, what&#8217;s new, etc&#8230; so I&#8217;ve written a simple flow in Tines to grab <strong>4000<\/strong> emails classified by analysts as clean, imposter, threat, spam, etc&#8230; and ask the LLM model to form an opinion on it, we save the output and compare it to what the humans thought of it.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">(blogpost of the Tines flow is coming)<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"benchmarking-prompts\">Benchmarking prompts<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Once you have a (work)flow to benchmark models you can adjust it easily to also benchmark prompts (eg: do we use XML hints, markdown hints?) try out different prompts, with the same benchmark, this allows you to check if a prompt is performing better or worse versus the same subset of emails.<br><\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"574\" src=\"https:\/\/cln.io\/blog\/wp-content\/uploads\/2024\/10\/image-8-1024x574.png\" alt=\"\" class=\"wp-image-1655\" srcset=\"https:\/\/cln.io\/blog\/wp-content\/uploads\/2024\/10\/image-8-1024x574.png 1024w, https:\/\/cln.io\/blog\/wp-content\/uploads\/2024\/10\/image-8-300x168.png 300w, https:\/\/cln.io\/blog\/wp-content\/uploads\/2024\/10\/image-8-768x431.png 768w, https:\/\/cln.io\/blog\/wp-content\/uploads\/2024\/10\/image-8-1536x861.png 1536w, https:\/\/cln.io\/blog\/wp-content\/uploads\/2024\/10\/image-8-2048x1148.png 2048w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"concept-2-micro-agents\"><br>Concept 2: Micro agents<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">The second concept is (for lack of better words) micro agents, I&#8217;ve also heard it being called AI supervisors; the idea is that you use the LLM to evaluate its own output and\/or summarize the good stuff<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"the-evaluation-micro-agent\">The evaluation micro agent<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Once in a while, the LLM would just not respond the way it was expected, so what I suggest you do is to pipe the LLM output back to the LLM with the prompt and ask if the output makes sense, its basically using LLM&#8217;s to babysit itself.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"561\" src=\"https:\/\/cln.io\/blog\/wp-content\/uploads\/2024\/10\/SCR-20241012-pkhg-2-1024x561.png\" alt=\"\" class=\"wp-image-1660\" srcset=\"https:\/\/cln.io\/blog\/wp-content\/uploads\/2024\/10\/SCR-20241012-pkhg-2-1024x561.png 1024w, https:\/\/cln.io\/blog\/wp-content\/uploads\/2024\/10\/SCR-20241012-pkhg-2-300x164.png 300w, https:\/\/cln.io\/blog\/wp-content\/uploads\/2024\/10\/SCR-20241012-pkhg-2-768x420.png 768w, https:\/\/cln.io\/blog\/wp-content\/uploads\/2024\/10\/SCR-20241012-pkhg-2-1536x841.png 1536w, https:\/\/cln.io\/blog\/wp-content\/uploads\/2024\/10\/SCR-20241012-pkhg-2-2048x1121.png 2048w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"the-summarizing-microagent\">The summarizing microagent<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">The idea with the summarizing micro agent is to ask the LLM x amount of the times a question, (3 times in the example given in the slide) and then ask the LLM to summarize the other outputs, this will eliminate odd replies like &#8220;Yes, I can analyze this email for you&#8221; <\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"577\" src=\"https:\/\/cln.io\/blog\/wp-content\/uploads\/2024\/10\/image-9-1024x577.png\" alt=\"\" class=\"wp-image-1661\" srcset=\"https:\/\/cln.io\/blog\/wp-content\/uploads\/2024\/10\/image-9-1024x577.png 1024w, https:\/\/cln.io\/blog\/wp-content\/uploads\/2024\/10\/image-9-300x169.png 300w, https:\/\/cln.io\/blog\/wp-content\/uploads\/2024\/10\/image-9-768x433.png 768w, https:\/\/cln.io\/blog\/wp-content\/uploads\/2024\/10\/image-9-1536x866.png 1536w, https:\/\/cln.io\/blog\/wp-content\/uploads\/2024\/10\/image-9-2048x1155.png 2048w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"stacking-combining-them\">Stacking\/combining them<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">you can combine these 2 concepts as well! either the summarizing agent with a check, or checks with a summarizing agent<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"570\" src=\"https:\/\/cln.io\/blog\/wp-content\/uploads\/2024\/10\/SCR-20241012-pqhx-1024x570.png\" alt=\"\" class=\"wp-image-1662\" srcset=\"https:\/\/cln.io\/blog\/wp-content\/uploads\/2024\/10\/SCR-20241012-pqhx-1024x570.png 1024w, https:\/\/cln.io\/blog\/wp-content\/uploads\/2024\/10\/SCR-20241012-pqhx-300x167.png 300w, https:\/\/cln.io\/blog\/wp-content\/uploads\/2024\/10\/SCR-20241012-pqhx-768x427.png 768w, https:\/\/cln.io\/blog\/wp-content\/uploads\/2024\/10\/SCR-20241012-pqhx-1536x855.png 1536w, https:\/\/cln.io\/blog\/wp-content\/uploads\/2024\/10\/SCR-20241012-pqhx-2048x1140.png 2048w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"things-that-work-for-us\">Things that work for us<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">In my last slide, I talked about things that work for us, and you might want to give it a shot too!<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The main concept is to leverage the &#8220;language&#8221; part of the phishing email with LLM&#8217;s (go figure), so giving it a bit less information often works better.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">I also suggest removing email headers that appliances add, if an LLM &#8220;sees&#8221; that a sandbox cleared the email, it tends to go like &#8220;Sandbox x, y have analyzed this email and have said it clean, so this email is not phishing&#8221; when its a phishing email<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The last thing I would like you to try is to leverage functions \/ job descriptions of people against LLM&#8217;s eg: does it make sense that Lucy a policy officer receives an invoice for 2 shipping containers of humanitarian aid? <\/p>\n\n\n\n<p class=\"wp-block-paragraph\">from my experience, LLM&#8217;s are very good at picking out &#8220;odd&#8221; emails addressed to certain jobs if you give them the context of the role of the recipient<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"538\" src=\"https:\/\/cln.io\/blog\/wp-content\/uploads\/2024\/10\/SCR-20241012-pqnr-1024x538.png\" alt=\"\" class=\"wp-image-1663\" srcset=\"https:\/\/cln.io\/blog\/wp-content\/uploads\/2024\/10\/SCR-20241012-pqnr-1024x538.png 1024w, https:\/\/cln.io\/blog\/wp-content\/uploads\/2024\/10\/SCR-20241012-pqnr-300x158.png 300w, https:\/\/cln.io\/blog\/wp-content\/uploads\/2024\/10\/SCR-20241012-pqnr-768x403.png 768w, https:\/\/cln.io\/blog\/wp-content\/uploads\/2024\/10\/SCR-20241012-pqnr-1536x807.png 1536w, https:\/\/cln.io\/blog\/wp-content\/uploads\/2024\/10\/SCR-20241012-pqnr-2048x1076.png 2048w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"what-s-next\">What&#8217;s next?<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Once you have tuned prompt &amp; model, you can run every incoming email against the LLM&#8217;s and start flagging phishing emails before they even reach the inbox of your end user. <\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Either have it as a step on your edge\/boundaries, or have it reactively pull the email from inboxes<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"568\" src=\"https:\/\/cln.io\/blog\/wp-content\/uploads\/2024\/10\/image-10-1024x568.png\" alt=\"\" class=\"wp-image-1669\" srcset=\"https:\/\/cln.io\/blog\/wp-content\/uploads\/2024\/10\/image-10-1024x568.png 1024w, https:\/\/cln.io\/blog\/wp-content\/uploads\/2024\/10\/image-10-300x167.png 300w, https:\/\/cln.io\/blog\/wp-content\/uploads\/2024\/10\/image-10-768x426.png 768w, https:\/\/cln.io\/blog\/wp-content\/uploads\/2024\/10\/image-10-1536x853.png 1536w, https:\/\/cln.io\/blog\/wp-content\/uploads\/2024\/10\/image-10-2048x1137.png 2048w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">Notes and mentions, <\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Thank you CERT-EU for the conference, and for accepting my lighting talk.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Header image taken from CERT-EU, made by Robert Laszlo Kiss<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Empowering SOC Analysts through AI NOTE: This content is from my experiences until October 2024, LLM&#8217;s might evolve and no longer need these tips &amp; tricks.No LLMs were used to generate this blog post or [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":1651,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[60,3,48,61,59],"tags":[],"class_list":["post-1649","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ai","category-automation","category-security","category-llm","category-tines"],"_links":{"self":[{"href":"https:\/\/cln.io\/blog\/wp-json\/wp\/v2\/posts\/1649","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/cln.io\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/cln.io\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/cln.io\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/cln.io\/blog\/wp-json\/wp\/v2\/comments?post=1649"}],"version-history":[{"count":22,"href":"https:\/\/cln.io\/blog\/wp-json\/wp\/v2\/posts\/1649\/revisions"}],"predecessor-version":[{"id":1685,"href":"https:\/\/cln.io\/blog\/wp-json\/wp\/v2\/posts\/1649\/revisions\/1685"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/cln.io\/blog\/wp-json\/wp\/v2\/media\/1651"}],"wp:attachment":[{"href":"https:\/\/cln.io\/blog\/wp-json\/wp\/v2\/media?parent=1649"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/cln.io\/blog\/wp-json\/wp\/v2\/categories?post=1649"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/cln.io\/blog\/wp-json\/wp\/v2\/tags?post=1649"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}