Delete is a capability.ġ) Run the search index= Record the number of events returned by the search. Check your capabilities before you attempt this. ![]() ![]() Remember you will need a user role that has delete capabilities to do the delete. My solution was to do a subsearch that returns a deduped list of events, where the returned value was a unique field. You will need one unique field in your index of events. This solution should delete every duplicate value. A lookup table may be the best way go about it. It may be susceptible to stats or return limits. Note that the search has not been tested with a large number of events. run a search to both, find and delete dupes:.# /opt/splunk/bin/splunk add oneshot /tmp/fullODupes.txt | transaction _raw maxspan=1s keepevicted=true mvlist=t Look for events where their ids match lookup table's delete_ids and pipe them thru delete. Create a lookup table out of all delete_ids.This is every value of the field id except for one and can be achieved by using mvindex. For each transacted event, create a new field called delete_id which will contain id values of all events to be deleted.Put ids in a multivalued field ( mvlist option in transaction).You can use transaction for this (or your own preferred method) Run a search that identifies all dupes and their respective ids.concatenation) of the following fields: index, splunk_server and _cd. Use | delete in the long and non-trivial way:Įach event in any splunk deployment, distributed or not, can be uniquely identified by a combination (ex. dedup your data and while at it, create a summary index using only the fields that your interested in. Wrap that in a macro and inject it in every search that operates on said data.ī. modify your searches to include a deduplication pipeline (perhaps a transaction with maxspan=1s) before your stats or other reporting commands. Having said that, here are your options:Ī. Attacking the root cause may prevent this from happening again in the future. And it is precisely this reason that the why and the how should definitely be discussed so that the problem can be solved, instead of the symptom. This is especially true for deduping, which, by its very nature can prove complex and expensive. ![]() There is no straightforward and easy way to do certain things efficiently in a distributed environment. The why's and wherefore's of HOW the data came to be duplicated is not of interest here.ĭoes anyone have a straightforward Search for removing duplicates (using |delete) that can take a large number of duplicate records and leave only 1? Our goal is to use |delete for the events so they are by default not searched. Now any dashboards that rely on historical data need to be edited to include dedup for clarity. Yes, dedup works fine….except in cases where you have an oops and all of your data in a particular index is now duplicated (or in some cases triplicated for some reason resulting in a multitude of dupes over a multitude of events). Great, this is fantastic information – however, |delete won't work after a non-streaming command, so actually removing the the events as described won't work. We can use various methods such as noted here. I've been looking into some ways to remove duplicate events using a search.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |