Mohit Taneja

Friday, August 16, 2013

Web Parsing and Custom Alerting

Ok, so this post is about an amazing project in which I did ended up solving a real world problem. Inside, CMU we have this cbdr website is used for putting up research studies happening at CMU campus in which students can participate, and most of them are paid studies. And since everyone specially including me at CMU is almost broke because of the really high education fee (you can never be too greedy CMU ), this website is in high demand during summers.

Now, there are a few bad things about this website, one cannot in any way register for email updates when a study is added, and most of the studies need finite number of participants which are in abundance, so regular FCFS applies. And well even though I was broke, I didn't really had the time to visit the website 20 times a day. So, I ended up writing a small script which checks if a new study has been added one their website, then send me an email.

So this script was required to do a few fundamental tasks :

1.) Parse the website automatically periodically after certain interval of time.
2.) Check the current list of studies against a database of studies that I have already been intimated about and notifying me only about the newly added studies.

3.) Send me an email when it identifies a new study has been added.

For the first part I ended up using zombie js module in nodejs, classic headless browser based web parsing and automation. Injecting jquery based code is a bit of a pain with the framework but one can always use the good old Dom objects.

For the database part mongodb was something which was really easy to setup and get working with JavaScript as it can directly add JavaScript objects and retrieve them so no conversion from/to strings.

And finally for sending emails, Randy at create lab pointed me towards the Linux sendmail command which needs mail utils to be setup, but once that is done one can easily use a shell script to send emails.

For deployment of the script, instead of using my crappy laptop I ended up deploying the script on a t1 micro instance on amazon EC2. Though yes I had to install nodejs, mongodb and mail utils on the EC2 instance but that's fairly straight forward.

For those of you interested in checking out the code for the same, it is hosted on my GitHub account https://github.com/tanejamohit/Parsing-Alert

Thankfully, now I am the first person to register for any study on CBDR. Sadly as the summer comes to an end, I doubt I will have a lot of time for any of the studies.

Tuesday, July 2, 2013

Practical Issues while Coding

Between, CREATE lab internship and GSoC work, and Startup Engineering course, I have been doing quite some amount of coding these days. I am learning some good lessons during all this coding, and thought it might be a good idea to jot them down for myself, and share with anyone who is interested.

The most important thing which I realized is that the amount of time spent understanding and reading the code is much much more than the amount of time spent writing that code. So, it makes sense to write the code in such a way that the amount of time spent on reading/understanding the code can be minimized. Also, 90% of the optimization happens in only 10% of the code, and sometimes it is better to write more readable code than writing optimized code.

Explaining what the code in a particular file does, along with the licensing info at the beginning of the file is actually a pretty good idea. And so is describing what does a function do at the beginning of the function.
It's better to use elaborate names for variables and functions, which are self explanatory, rather than using concise variable names. It's even better to use function names which are don't include the technical lingo and are rather understandable by a user who don't have a great idea about the library being used by you. chooseVisual() is a much better function name than, createEGLConfiguration().
Instead of aiming for writing the most optimized code from the beginning it is better to write the most readable code, and then try to optimize it. Also, using temporary variables is not really that a bad idea if they can make your code easier to understand.
It makes a lot of sense if the same convention for naming functions and variables is used throughout the code base (assuming there are multiple engineers working on the code base). Also, if you use the same indentation style, and commenting style, your code might start looking good too. As of ow that seems to be the toughest thing to achieve, a good looking code. But, I am working towards it, hopefully I should get the hang of it.

Sunday, October 7, 2012

The GSoC Experience : How to apply as a student ?

The first time I participated in GSoC, I was in final year of my college life in 2010. And it has been a wonderful experience. At that time I participated as a student and this year I participated as a mentor, completing the full circle.

In this post I will write about how to apply for GSoC, my experience as a student, and then about my experience as a mentor. Since, that would be too much for a single post, I think right now I will stick only to advice about how to apply as a student (assuming you are applying for the first time).

Once you have read enough about the program and familiarize yourself with how it works, the first major task is to select the organisation to which you plan to apply.

Selecting the Organisation :

Try and select an organisation whose work interest you, try to look at organisations in a field where you want to work in future. In my case, I was interested in the field of computer graphics and game development, so I selected organisations like Crystalspace3D, Tux4Kids and some other organisations who were working in the domains which interested me. One can do this very simple by seeing the list of orgs which were selected last year in GSoC (and assuming that they would be selected this year too), and checking out their org description page.
Once your search has been reduced to 8-10 orgs by using the above criteria. Then it is time to delve a little deeper and try out products or services by these orgs. If you are interested in operating systems, it might mean installing and trying out a lot of them, a.k.a pain in the ass. But who said applying to GSoC is easy. And knowing more about the technology they use and the current work going in those organisations. In my opinion the best way to figure that out is to check out the projects they were doing last summer and checking out their previous year and if they have compiled then this year's idea's page for GSoC.
Once you are done with this you should generally try to short your list to 2-3 orgs, with which you should interact.

Interacting with the Community :

The most common mistake to do is to enter the irc channel of an org and expect them to tell you about their product. Do your homework first. Before entering irc channel of an org and asking questions, read about the organisation their website, their wiki page, do try out their product/s, and definitely have a good look at their ideas page before asking questions.
In fact, I would say that it would be better if you first select a project or two, in which you are interested in, and discussing and asking about them. By doing this you can come to know about how important a particular project is for a community, and who would be mentoring that project if a student is selected for that project. Also, what does the community expects a student to do if one is working on that project and how do they expect a student to do that. I believe that if one can gather this information while interacting with a community, then it can be a huge first step.
Once you know about a GSoC project and feel that you can do it and it would be worthwhile your time. It is time to set up the development environment for that particular project and learn more about the module on which you would be working on (if you are working on an existing code base) or learn about the modules with which your module would be interacting. And try and read blogs and research papers on that particular topic. In my case, I took a project on global illumination method photon mapping and I had no clue about what it is before applying for GSoC.
Once you know about the topic and the theory about it it is time to visit the irc channel again. This time you should be discussing in detail about the code and what exactly the community and your mentors expect you to do in that project if you are selected. Discuss about the code you would have to write, the time it might take and how should you try to approach it.

Preparing your Application :

Once you have a decent idea about the project and how you would be doing it. It's best to start creating your application. It is always better to be the first one to post GSoC application on the mailing list. This helps in two ways, firstly, some of the influential people in the community and senior members don't always hang around on irc, so it becomes easier to get noticed and get recommendation about your application from them too. Secondly, by being the first one to post your application on the mailing list you ask other students to probably check out some other project, rather than the one in which you are interested in.
Generally most of the organisations have a template for an application. But your application should generally include answers to a few questions: Who are you? What is the project you want to work on? How would you solve the problem or complete your project? Timeline.
Also it is best if you break you goals into primary and secondary goals. Primary, being the ones which are directly related to your project and which you should definitely get implemented during the GSoC period, no matter what. And secondary being the ones which you would want to implement during the GSoC period if you finish the primary goals on time. Make sure you and your mentors align with the primary and secondary goals you write in your application.
Once you have posted your application don't hesitate to ask questions like: Are my goals good enough for a GSoC project? Are these goals doable within the period of GSoC? What all contingencies should I keep in mind?
One very important part of application is the project timeline. It is best if you write down the timeline for your project in terms of every week of your GSoC intern. It helps you assess whether your goals are realistic or not, or are they good enough for a GSoC intern. Also once you have refined it and if you finally get selected. It helps as a time table.

And well towards the last week before application submission deadline, it is mostly about interacting with mentors, learning more about the project and the code base and updating your application in sync with it.

Now, please dont treat this article as a guide to getting selected for a GSoC intern. These are just some pointers from my personal experience which I think might help someone applying for a GSoC intern for the first time.

I did my GSoC intern with Crystalspace3D in 2010, and most of the opinions are based on my experience as a student in 2010 and as a mentor in 2012.

Monday, June 27, 2011

Do we really need google search..?

I have been recently thinking about the number of times I have to use google search. Well it has been decreasing for sure. The reason being that there are other places where I can search for things and that too in a localized way.

If I have to search for books I search on flipkart, if I have to search for movie shows, I search bookmyshow, If I have to search for food outlets, I use zomato, isohunt for torrents, wikipedia for knowledge based articles.

It seems that google search has lost relevance to a certain extent today. I guess it is because it generally redirects one to places where u can search for things instead of searching things for you.

Any thoughts..? Anyone else also feels like their google searches have reduced..?

Sunday, June 19, 2011

Pygame + Desktop ------> Javascript + Canvas + Browser

The topic is just about the change of platform for my personal projects, till date I have worked on games focused on desktop environment and have used Pygame + Python for them.

But, recently I have been trying with Javascript + HTML5 canvas. I have been just trying out things as of now, not yet an expert at anything or so. But yeah there are a lot of things which are supposed to be done differently, and many things are similar too.

The rendering context for both of them is same. You have to expect the screen in both the cases as a piece of paper, once you draw anything on it, you cannot delete it or change it. In every new frame you can either use the same paper or clear some parts of it and then use it, or use a new paper altogether.

The event handling mechanism is different, in python one used to handle every event like a keypress in a loop and depending on the type of the event (check that in a switch) one used to take appropriate action. In Javascript, one needs to attach a callback with different events and when they will occur, the callback function will be called.

The good thing with python was that it was cross platform so one was not required to take care of platform specific details. But that isn't the case with Javascript, the way firefox handles events it might be, and in some cases is, different from safari or chrome. For example the simple code for finding the position of a mouse when it is clicked is different in firefox and safari. So, for such things one needs to use JQuery another library, which makes traversing DOM objects easy, and standardizes Javascript code across browsers. The good thing about JQuery is that it is pretty lightweight, something like a single javascript file of less than 1000 lines.

Most of the animation in pygame was supposed to be done by manipulating images, infact most of the game view part was done using images/surfaces, in Javascript one tends to use native drawing functions. Now, this might be just a paradox in my mind, or it might just be the cult as canvas is yet a work in progress, or it is just a difference in the themes of games I used to work on pygame as compared to those of canvas, in my mind. But, still this seems to be there in general. It could also be because of speed issues of handling images in browser versus that in pygame.

All in all, I am still learning about new things many of my findings here might not be true, but still i will try and update them as and when I find out more.