My Technical Blog

Posts

Showing posts from 2016

How to use VisualVM

VisualVM can be very helpful to discover the performance lags in Java application. It is one of the easiest profiling tools for Java. Download VisualVM https://visualvm.github.io/ Run VisualVM and check local running java apps: Remote Profiling. Run your java application with following JVM arguments: - Djavax . management . builder . initial = - Dcom . sun . management . jmxremote - Dcom . sun . management . jmxremote . port = 9010 - Dcom . sun . management . jmxremote . local . only = false - Dcom . sun . management . jmxremote . authenticate = false - Dcom . sun . management . jmxremote . ssl = false Above parameters, makes your remote java application to listen to port 9010. Then, you can connect to it from VisualVM by Menu->File->Add JMX connection Type your hostname and port. Example: 192.168.10.10:9010 (IP address of remote machine and port) Performance Profiling After you connect to your app fro...

Performance tuning for Web engine

Install Tsung on CentOS Pre-requisites: 1. Install Erlang: sudo yum -y update && sudo yum -y upgrade sudo yum install epel-release sudo yum -y install erlang perl perl-RRD-Simple.noarch perl-Log-Log4perl-RRDs.noarch gnuplot perl-Template-Toolkit 2. Get Tsung wget http://tsung.erlang-projects.org/dist/tsung-1.6.0.tar.gz 3. Extract and Install tar zxvf tsung-1.6.0.tar.gz cd tsung-1.6.0 ./configure && make && sudo make install Note: Sample XML configurations are located in /usr/share/doc/tsung/examples/http_simple .xml Setup up Cluster Testing with Tsung 1. Add cluster nodes info in each node's "/etc/hosts" sudo vi /etc/hosts # cluster nodes 192.168.10.10 n1 192.168.10.11 n2 192.168.10.12 n3 192.168.10.13 n4 2. Setup ~/.ssh/config file vi ~/. ssh /config Host n1 Hostname n1 ...

Java: BloomFilter Benchmark

Intro Bloom filter is a probabilistic data structure for searching element in a data set. It is similar to HashSet, similarly it tells us whether the set contains certain element or not. Difference is the output of contains(element)=TRUE is futuristic. In our example we set futuristic value to 0.01 , which means the answer "It contains" is 99% correct. Read more about Bloom filter from here: https://en.wikipedia.org/wiki/Bloom_filter Scenario We create two Arrays of random elements. Elements count in each array is 1,000,000. Then we insert the first array into BloomFilter, and we iterate the first array and check if the item contains in BloomFilter. Second array is used only for checking non-existing elements. We do the same for HashSet as described above. Benchmarking code We used customized version of Bloom filter which can accept byte array. (Previous version of this blog was using encoding of string for every put and contains, which was misguiding...

NLP for Uzbek language

Natural language processing is an essential tool for text mining in data analysis field. In this post, I want to share my approach in developing stemmer for Uzbek language. Uzbek language is spoken by 27 million people around the world and there are a lot of textual materials in internet in uzbek language and it is growing. As I was doing my weekend project " FlipUz " (which is news aggregator for Uzbek news sites) I stumbled on a problem of automatic tagging news into different categories. As this requires a good NLP library, I was not able to find one for Uzbek language. That is how I got a motive to develop a stemmer for Uzbek language. In short, Stemming is an algorithm to remove meaningless suffixes at the end, thus showing the core part of the word. For example: rabbits -> rabbit. As Uzbek language is similar to Turkish, I was curious if there is stemmer for Turkish. And I found this: Turkish St...