Profiling Your Streams

In the previous section we saw how to setup threat intel feeds. In this section we will see how to profile the feeds and then incorporate these profiles further in the triage rules.

The first thing we need to do is setup a profile table in HBase. To do so go to the HBase shell and create a sample table like so:

$ /usr/hdp/current/hbase-client/bin/hbase shell
hbase(main):001:0> create 'profiler', 'P'

This is where your profile will be written to. Now we need to configure the profiler. Pull up the following config file

/usr/metron/{$METRON_VERSION}/config/profiler.properties

Lets look at the file further to see what the different properties mean:

profiler.workers=1 # storm parallelism
profiler.executors=0 # storm parallelism
profiler.input.topic=indexing # which stream to tap into
profiler.period.duration=1 # coupled with the setting below this adjust the profile window (bins)
profiler.period.duration.units=MINUTES
profiler.ttl=30
profiler.ttl.units=MINUTES
profiler.hbase.salt.divisor=1000 #setting for the Hbase key to ensure randomness
profiler.hbase.table=profiler #name of the table we just created where the profile will be stored
profiler.hbase.column.family=P #name of the column family where our profile will be stored
profiler.hbase.batch=10 #batch writes of the profile
profiler.hbase.flush.interval.seconds=30 #how frequently to flush to Hbase

The important thing to note here is that I setup my profile window to be 1 minute for this demo. Generally you would want your profile window to be about 15 minutes or so, depending on your telemetry.

To setup a profile pull up the Stellar shell by typing:

/usr/metron/{$METRON_VERSION}/bin/stellar -z ${ZOOKEEPER}:2181

Now lets setup our profiles. The first thing we need to do is edit the profiler config. Using the shell, as we have done in the previous examples, we pull up the profiler config from Zookeeper like so:

profilerConfig := CONFIG_GET("PROFILER")

This will pull in your profiler config from Zookeeper and store it into the shell variable called profilerConfig. To display the config (which should be initially empty just type:

profilerConfig

This will display the config in the shell. Now lets edit the config. Type:

profilerConfig := SHELL_EDIT(profilerConfig)

This will pull up VI or editor of your choice to allow you to make edits to the config. Now lets add the following profiles:

{
"profiles": [
{
"profile": "squid-miss",
"foreach": "ip_src_addr",
"onlyif": "source.type == 'squid' and action == 'TCP_MISS'",
"update": { "m": "STATS_ADD(m, 1)" },
"result": "m"
},
{
"profile": "url-length",
"foreach": "ip_src_addr",
"onlyif": "source.type == 'squid'",
"update": { "m": "STATS_ADD(m, LENGTH(url))" },
"result": "m"
}
]
}

Lets see what we just did. We added two profiles on the stream. If you recall, as a part of our parsed telemetry message we have the fields:

action with content like TCP_MISS

url with content like http://www.pravda.ru/science/

The first profile is called squid-miss. It profiles by the source IP from the squid telemetry only and counts the number of TCP misses of the proxy for the time window length we setup previously. In the update clause we see that every time it finds this condition to be true it adds 1 to the profile, thereby counting the number of times the profile the condition has occurred. The second profile we see profiles the cumulative length of the URLs of that the users hit per a time window. There are multiple additional profiles that can be setup. Refer to the instructions here: https://github.com/apache/incubator-metron/tree/master/metron-analytics/metron-profiler

Now that we have the profiles setup exit the editor and lets push the config back into zookeeper

CONFIG_PUT(profilerConfig)

Now our config is uploaded and we need to start our profiler topology. We do so by typing:

/usr/metron/{$METRON_VERSION}/bin/start_profiler_topology.sh

This will start your profiler and you should see it running in your storm UI. Now let it run for a few minutes. It needs to accumulate data over multiple profile windows. To make sure the profiles are being created log back into your hbase shell and scan the profile table:

scan 'profiler'

You should see something that looks like this:

There we see our profile key prefixed by the salt (squid-miss). Don't be alarmed by the value of the profile. Here we are using a sketch to profile our stream and storing the stats sketch object in the table. We do this so that we can later merge these objects over multiple time windows and ask the profiler interesting statistical questions about our data.

Now that you let your profile run go back into the Stellar shell:

profile := PROFILE_GET("squid-miss", "127.0.0.1", 2, "HOURS")

This will pull up your profiles as an array (with one entry for each time window) and load them into your shell variable called profile. Now lets play with the data and see what we can ask . Type the following in the shell:

STATS_MEAN( GET_FIRST( profile))

This will get you the mean value of all squid misses for the first profile window. We can also type:

STATS_PERCENTILE( GET_FIRST(stats), 90)

This will get us the value of the 90th percentile. We can also ask these questions over the entire 2 hours range.

merged := STATS_MERGE( profile)

This merges all your profile windows into one. So now you can ask the same thing over last 2 hours. We will see in the next section how to integrate this into a triage rule so that we can leverage the profile information in triaging alerts.

Space shortcuts

Page tree