Basic Waartaa-ElasticSearch Integration – GSoC Week 1

ElasticSearch is a flexible and powerful open source, distributed, real-time search and analytics engine. Waartaa-ElasticSearch integration is the first sub-task which begins the implementation of browse/search channel chat logs. After the GSoC bonding period was over I started working on it. Here is what I did in last week.

Creating mapping for Channel Logs index

Last week I spent most of my time reading ElasticSearch documentation and tutorials. I have got good understanding of what ElasticSearch is capable of. Mapping is telling ElasticSearch which field in document is searchable, defining their type and few more things. Here is the first version of mapping that I created. It will change over time as new fields may get added in document and we may also change the analyzer used for ‘message’ field to account for misspelled search terms like Google does it.

Indexing Channel Logs

If you have read my last blog post, you will know I already completed this task during GSoC community bonding period. In last week I just refactored it a bit, made it configurable in settings file and tested the code on my system. My mentor told me he is facing some issue with ElasticSearch secure installation. He will merge my PR as soon as installation is done.

Oh wait I forgot to tell you how I will be transferring older chat logs from MongoDB to ElasticSearch. One way is to write a script but there is always a better way to do such things. There is an open source river plugin available which does exactly what I want.

Generating permalinks for incoming chat messages

Each chat log/message will have it’s own unique permalink that users can send to each other to refer to some old conversation. There are many other use cases of it which I will not be discussing here now. There wasn’t much to do in this task because these are generated automatically in form of  ‘_id’ field of document that is indexed in ElasticSearch.

Conclusion

In terms of code, basic Waartaa-ElasticSearch integration is complete. I hope the problem with ElasticSearch installation gets resolved asap so that I can see my code in action in production 🙂

What’s next?

Next, I will be working on building an API to search/browse channel logs. Hopefully I will complete it by end of this week and will write a blog post about it too 🙂

<script>JS I love you</script>

Advertisements

3 thoughts on “Basic Waartaa-ElasticSearch Integration – GSoC Week 1

  1. What are the other operations that could be performed on a log file? Can a user delete a log file? if yes, then you should also take care of removing the entry from the search index

    • I think you misunderstood the scenario.

      Here is the scene- two people talking in an IRC channeI using Waartaa as a client. There chat messages/logs get indexed in ElasticSearch(ES). In IRC there is no way to edit your previous chat messages so the only operation that is performed is insert(i.e no update/delete) message/log in ES. BTW, it’s not difficult to remove an entry from search index in ES once it’s indexed. ES provides a robust API to do CRUD operations.

      I hope I have answered your query.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s