Framework to easily detect outliers in Elasticsearch events.
Developed in Python and fully dockerized!
ee-outliers is a framework to detect statistical outliers in events stored in an Elasticsearch cluster. It uses easy to write user-defined configuration files to decide which & how events should be analysed for outliers.
The framework was developed for the purpose of detecting anomalies in security events, however it could just as well be used for the detection of outliers in other data.
The only thing you need is Docker and an Elasticsearch cluster and you are ready to start your hunt for outlier events!
Although we love Elasticsearch, its search language is still lacking support for complex queries that allow for advanced analysis and detection of outliers - features we came to love while using other tools such as Splunk.
This framework tries to solve these limitations by allowing the user to write simple use cases that can help in spotting outliers in your data using statistical and models. Machine learning models are under development.
The framework makes use of statistical models that are easily defined by the user in a configuration file. In case the models detect an outlier, the relevant Elasticsearch events are enriched with additional outlier fields. These fields can then be dashboarded and visualized using the tools of your choice (Kibana or Grafana for example).
The possibilities of the type of anomalies you can spot using ee-outliers is virtually limitless. A few examples of types of outliers we have detected ourselves using ee-outliers during threat hunting activities include:
Visit the page Getting started to get started with outlier detection in Elasticsearch yourself!
ee-outliers is developed & maintained by NVISO Labs.
You can reach out to the developers of ee-outliers by creating an issue in github.
For any other communication, you can reach out by sending us an e-mail
at [email protected].
We write about our research on our blog: https://blog.nviso.eu
You can follow us on twitter: https://twitter.com/NVISO_Labs
Thank you for using ee-outliers and we look forward to your feedback! 🐀
ee-outliers is released under the GNU GENERAL PUBLIC LICENSE v3 (GPL-3). LICENSE
We are grateful for the support received by INNOVIRIS and the Brussels region in funding our Research & Development activities.
Bumps certifi from 2017.7.27.1 to 2022.12.7.
9e9e840
2022.12.07b81bdb2
2022.09.24939a28f
2022.09.14aca828a
2022.06.15.2de0eae1
Only use importlib.resources's new files() / Traversable API on Python ≥3.11 ...b8eb5e9
2022.06.15.147fb7ab
Fix deprecation warning on Python 3.11 (#199)b0b48e0
fixes #198 -- update link in license9d514b4
2022.06.154151e88
Add py.typed to MANIFEST.in to package in sdist (#196)Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase
.
Bumps pillow from 9.0.1 to 9.3.0.
Sourced from pillow's releases.
9.3.0
https://pillow.readthedocs.io/en/stable/releasenotes/9.3.0.html
Changes
- Initialize libtiff buffer when saving #6699 [
@radarhere
]- Limit SAMPLESPERPIXEL to avoid runtime DOS #6700 [
@wiredfool
]- Inline fname2char to fix memory leak #6329 [
@nulano
]- Fix memory leaks related to text features #6330 [
@nulano
]- Use double quotes for version check on old CPython on Windows #6695 [
@hugovk
]- GHA: replace deprecated set-output command with GITHUB_OUTPUT file #6697 [
@nulano
]- Remove backup implementation of Round for Windows platforms #6693 [
@cgohlke
]- Upload fribidi.dll to GitHub Actions #6532 [
@nulano
]- Fixed set_variation_by_name offset #6445 [
@radarhere
]- Windows build improvements #6562 [
@nulano
]- Fix malloc in _imagingft.c:font_setvaraxes #6690 [
@cgohlke
]- Only use ASCII characters in C source file #6691 [
@cgohlke
]- Release Python GIL when converting images using matrix operations #6418 [
@hmaarrfk
]- Added ExifTags enums #6630 [
@radarhere
]- Do not modify previous frame when calculating delta in PNG #6683 [
@radarhere
]- Added support for reading BMP images with RLE4 compression #6674 [
@npjg
]- Decode JPEG compressed BLP1 data in original mode #6678 [
@radarhere
]- pylint warnings #6659 [
@marksmayo
]- Added GPS TIFF tag info #6661 [
@radarhere
]- Added conversion between RGB/RGBA/RGBX and LAB #6647 [
@radarhere
]- Do not attempt normalization if mode is already normal #6644 [
@radarhere
]- Fixed seeking to an L frame in a GIF #6576 [
@radarhere
]- Consider all frames when selecting mode for PNG save_all #6610 [
@radarhere
]- Don't reassign crc on ChunkStream close #6627 [
@radarhere
]- Raise a warning if NumPy failed to raise an error during conversion #6594 [
@radarhere
]- Only read a maximum of 100 bytes at a time in IMT header #6623 [
@radarhere
]- Show all frames in ImageShow #6611 [
@radarhere
]- Allow FLI palette chunk to not be first #6626 [
@radarhere
]- If first GIF frame has transparency for RGB_ALWAYS loading strategy, use RGBA mode #6592 [
@radarhere
]- Round box position to integer when pasting embedded color #6517 [
@radarhere
]- Removed EXIF prefix when saving WebP #6582 [
@radarhere
]- Pad IM palette to 768 bytes when saving #6579 [
@radarhere
]- Added DDS BC6H reading #6449 [
@ShadelessFox
]- Added support for opening WhiteIsZero 16-bit integer TIFF images #6642 [
@JayWiz
]- Raise an error when allocating translucent color to RGB palette #6654 [
@jsbueno
]- Moved mode check outside of loops #6650 [
@radarhere
]- Added reading of TIFF child images #6569 [
@radarhere
]- Improved ImageOps palette handling #6596 [
@PososikTeam
]- Defer parsing of palette into colors #6567 [
@radarhere
]- Apply transparency to P images in ImageTk.PhotoImage #6559 [
@radarhere
]- Use rounding in ImageOps contain() and pad() #6522 [
@bibinhashley
]- Fixed GIF remapping to palette with duplicate entries #6548 [
@radarhere
]- Allow remap_palette() to return an image with less than 256 palette entries #6543 [
@radarhere
]- Corrected BMP and TGA palette size when saving #6500 [
@radarhere
]
... (truncated)
Sourced from pillow's changelog.
9.3.0 (2022-10-29)
Limit SAMPLESPERPIXEL to avoid runtime DOS #6700 [wiredfool]
Initialize libtiff buffer when saving #6699 [radarhere]
Inline fname2char to fix memory leak #6329 [nulano]
Fix memory leaks related to text features #6330 [nulano]
Use double quotes for version check on old CPython on Windows #6695 [hugovk]
Remove backup implementation of Round for Windows platforms #6693 [cgohlke]
Fixed set_variation_by_name offset #6445 [radarhere]
Fix malloc in _imagingft.c:font_setvaraxes #6690 [cgohlke]
Release Python GIL when converting images using matrix operations #6418 [hmaarrfk]
Added ExifTags enums #6630 [radarhere]
Do not modify previous frame when calculating delta in PNG #6683 [radarhere]
Added support for reading BMP images with RLE4 compression #6674 [npjg, radarhere]
Decode JPEG compressed BLP1 data in original mode #6678 [radarhere]
Added GPS TIFF tag info #6661 [radarhere]
Added conversion between RGB/RGBA/RGBX and LAB #6647 [radarhere]
Do not attempt normalization if mode is already normal #6644 [radarhere]
... (truncated)
d594f4c
Update CHANGES.rst [ci skip]909dc64
9.3.0 version bump1a51ce7
Merge pull request #6699 from hugovk/security-libtiff_buffer2444cdd
Merge pull request #6700 from hugovk/security-samples_per_pixel-sec744f455
Added release notes0846bfa
Add to release notes799a6a0
Fix linting00b25fd
Hide UserWarning in logs05b175e
Tighter test case13f2c5a
Prevent DOS with large SAMPLESPERPIXEL in Tiff IFDDependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase
.
If you observe the function process_outlier() in analyzer.py you can see that self.total_outliers is incremented event if the outlier is whitelisted. https://github.com/NVISO-BE/ee-outliers/blob/58021dc20f6cbbe411c0a6337ea39a82fc139a9d/app/helpers/analyzer.py#L220-L235 Also, I observed that the number of whitelisted outliers represented by the variable 'self.nr_whitelisted_elements' is never incremented in that function. As a result, it does not count the number of whitelisted outliers in simplequery, word2vec and sudden_appearance models.
Add parameter min_aggregator_bucket
with default value around 1000.
It would classify an event as outlier only if the aggregator bucket has a size bigger than the parameter min_aggregator_bucket
.
Why? By observing the outliers on production, most of the FP( near to the total ) have small number of events in their aggregation bucket.
To illustrate the idea, let's observe the following example use-case:
```
[sudden_appearance_winlog_renamed_process] es_query_filter=exists:winlog.event_id AND winlog.event_id: 1
aggregator=winlog.event_data.Description.keyword target=process.name
history_window_days=7 history_window_hours=0
sliding_window_size=03:00:00
sliding_window_step_size=00:01:00
outlier_type=first observation outlier_reason=sudden appearance of a renamed process outlier_summary=sudden appearance of a process renamed to {process.name} with description {winlog.event_data.Description}
run_model=1
test_model=0
``
It is worth noting that
winlog.event_data.Description, which is selected as
aggregator parameter, corresponds to the process description which stay constant even if you change the name of the process.
Therefore, the goal of this use-case is to catch events with process name that suddenly change (ATT&CK T1218 or T1036) while the description stay the same.
Example:
powershell.exewith description
Windows PowerShellthat is suddenly renamed to
catchme.exe`.
If an event is caught as outlier with a aggregator bucket of size near to 1, it will simply means that a new unseen process is suddenly running but not that a process has been suddenly renamed. At the opposite, if the bucket size is big it means that a lot of event has been observed with a certain description and a certain name and that the name has been suddenly changed.
This improvement is easy to implement and for my point of view, essential to make sudden_appearance
work efficiently.
@daanraman @michielmeersmans What do you think?
Hello, ee-outliers seems like a good project, do you plan to add "notifier" like "TheHive" or other ?? SMTP is only the possibility for the moment.
outliers netsec threat-hunting statistics security-monitoring anomaly-detection outlier-detection siem cirt security-operations ee-outliers ml machine-learning statistical-analysis