Mohit Taneja

Sunday, October 7, 2012

The GSoC Experience : How to apply as a student ?

The first time I participated in GSoC, I was in final year of my college life in 2010. And it has been a wonderful experience. At that time I participated as a student and this year I participated as a mentor, completing the full circle.

In this post I will write about how to apply for GSoC, my experience as a student, and then about my experience as a mentor. Since, that would be too much for a single post, I think right now I will stick only to advice about how to apply as a student (assuming you are applying for the first time).

Once you have read enough about the program and familiarize yourself with how it works, the first major task is to select the organisation to which you plan to apply.

Selecting the Organisation :

Try and select an organisation whose work interest you, try to look at organisations in a field where you want to work in future. In my case, I was interested in the field of computer graphics and game development, so I selected organisations like Crystalspace3D, Tux4Kids and some other organisations who were working in the domains which interested me. One can do this very simple by seeing the list of orgs which were selected last year in GSoC (and assuming that they would be selected this year too), and checking out their org description page.
Once your search has been reduced to 8-10 orgs by using the above criteria. Then it is time to delve a little deeper and try out products or services by these orgs. If you are interested in operating systems, it might mean installing and trying out a lot of them, a.k.a pain in the ass. But who said applying to GSoC is easy. And knowing more about the technology they use and the current work going in those organisations. In my opinion the best way to figure that out is to check out the projects they were doing last summer and checking out their previous year and if they have compiled then this year's idea's page for GSoC.
Once you are done with this you should generally try to short your list to 2-3 orgs, with which you should interact.

Interacting with the Community :

The most common mistake to do is to enter the irc channel of an org and expect them to tell you about their product. Do your homework first. Before entering irc channel of an org and asking questions, read about the organisation their website, their wiki page, do try out their product/s, and definitely have a good look at their ideas page before asking questions.
In fact, I would say that it would be better if you first select a project or two, in which you are interested in, and discussing and asking about them. By doing this you can come to know about how important a particular project is for a community, and who would be mentoring that project if a student is selected for that project. Also, what does the community expects a student to do if one is working on that project and how do they expect a student to do that. I believe that if one can gather this information while interacting with a community, then it can be a huge first step.
Once you know about a GSoC project and feel that you can do it and it would be worthwhile your time. It is time to set up the development environment for that particular project and learn more about the module on which you would be working on (if you are working on an existing code base) or learn about the modules with which your module would be interacting. And try and read blogs and research papers on that particular topic. In my case, I took a project on global illumination method photon mapping and I had no clue about what it is before applying for GSoC.
Once you know about the topic and the theory about it it is time to visit the irc channel again. This time you should be discussing in detail about the code and what exactly the community and your mentors expect you to do in that project if you are selected. Discuss about the code you would have to write, the time it might take and how should you try to approach it.

Preparing your Application :

Once you have a decent idea about the project and how you would be doing it. It's best to start creating your application. It is always better to be the first one to post GSoC application on the mailing list. This helps in two ways, firstly, some of the influential people in the community and senior members don't always hang around on irc, so it becomes easier to get noticed and get recommendation about your application from them too. Secondly, by being the first one to post your application on the mailing list you ask other students to probably check out some other project, rather than the one in which you are interested in.
Generally most of the organisations have a template for an application. But your application should generally include answers to a few questions: Who are you? What is the project you want to work on? How would you solve the problem or complete your project? Timeline.
Also it is best if you break you goals into primary and secondary goals. Primary, being the ones which are directly related to your project and which you should definitely get implemented during the GSoC period, no matter what. And secondary being the ones which you would want to implement during the GSoC period if you finish the primary goals on time. Make sure you and your mentors align with the primary and secondary goals you write in your application.
Once you have posted your application don't hesitate to ask questions like: Are my goals good enough for a GSoC project? Are these goals doable within the period of GSoC? What all contingencies should I keep in mind?
One very important part of application is the project timeline. It is best if you write down the timeline for your project in terms of every week of your GSoC intern. It helps you assess whether your goals are realistic or not, or are they good enough for a GSoC intern. Also once you have refined it and if you finally get selected. It helps as a time table.

And well towards the last week before application submission deadline, it is mostly about interacting with mentors, learning more about the project and the code base and updating your application in sync with it.

Now, please dont treat this article as a guide to getting selected for a GSoC intern. These are just some pointers from my personal experience which I think might help someone applying for a GSoC intern for the first time.

I did my GSoC intern with Crystalspace3D in 2010, and most of the opinions are based on my experience as a student in 2010 and as a mentor in 2012.

Monday, June 27, 2011

Do we really need google search..?

I have been recently thinking about the number of times I have to use google search. Well it has been decreasing for sure. The reason being that there are other places where I can search for things and that too in a localized way.

If I have to search for books I search on flipkart, if I have to search for movie shows, I search bookmyshow, If I have to search for food outlets, I use zomato, isohunt for torrents, wikipedia for knowledge based articles.

It seems that google search has lost relevance to a certain extent today. I guess it is because it generally redirects one to places where u can search for things instead of searching things for you.

Any thoughts..? Anyone else also feels like their google searches have reduced..?

Sunday, June 19, 2011

Pygame + Desktop ------> Javascript + Canvas + Browser

The topic is just about the change of platform for my personal projects, till date I have worked on games focused on desktop environment and have used Pygame + Python for them.

But, recently I have been trying with Javascript + HTML5 canvas. I have been just trying out things as of now, not yet an expert at anything or so. But yeah there are a lot of things which are supposed to be done differently, and many things are similar too.

The rendering context for both of them is same. You have to expect the screen in both the cases as a piece of paper, once you draw anything on it, you cannot delete it or change it. In every new frame you can either use the same paper or clear some parts of it and then use it, or use a new paper altogether.

The event handling mechanism is different, in python one used to handle every event like a keypress in a loop and depending on the type of the event (check that in a switch) one used to take appropriate action. In Javascript, one needs to attach a callback with different events and when they will occur, the callback function will be called.

The good thing with python was that it was cross platform so one was not required to take care of platform specific details. But that isn't the case with Javascript, the way firefox handles events it might be, and in some cases is, different from safari or chrome. For example the simple code for finding the position of a mouse when it is clicked is different in firefox and safari. So, for such things one needs to use JQuery another library, which makes traversing DOM objects easy, and standardizes Javascript code across browsers. The good thing about JQuery is that it is pretty lightweight, something like a single javascript file of less than 1000 lines.

Most of the animation in pygame was supposed to be done by manipulating images, infact most of the game view part was done using images/surfaces, in Javascript one tends to use native drawing functions. Now, this might be just a paradox in my mind, or it might just be the cult as canvas is yet a work in progress, or it is just a difference in the themes of games I used to work on pygame as compared to those of canvas, in my mind. But, still this seems to be there in general. It could also be because of speed issues of handling images in browser versus that in pygame.

All in all, I am still learning about new things many of my findings here might not be true, but still i will try and update them as and when I find out more.

Friday, April 8, 2011

Gaming + Work + Movie + Less Sleep = Crazy Dreams

I return back to my cubicle and open a ninja fighter app on my mobile and just when I open it I am sucked in by my mobile and put in a virtual arena, where I have to fight against 4 japanese black belt fighters, I see that my clothes have also been changed from jeans + shirt to a karate dress with a black belt. And as I approach those fighters with full confidence I am punched right on my nose. And I get up from my dream.

This is what happens when you have been playing a lot of games on your phone, watched a late night show of sucker punch, and have been overworked with issues related to thread synchronization, and have been missing on your sleep since quite some time.

Saturday, August 28, 2010

Cache Coherence Gyaan

Total Geek talk, noobs feel free to close the tab.

Finally, I realized that it is good to share technical knowledge , because the more you share the more you know.

Basics: Cache is a mid level memory between the CPU and the RAM, the simple funda behind cache is that it is a bit expensive and much more faster than your RAM (in terms of access time) , so during processing some of the data from RAM is brought into the cache, the data which is assumed to be accessed much more often. So that the system becomes more efficient.

Now, all is good till the time there is only one core, but if there are multiple cores which are sharing the same memory space, it creates a bit of a mess. One needs to know who is changing the memory contents, because each core would be having its own cache. Say, there is a variable which is used by both the cores, and both of them have cached it. Now, if core1 updates its value in its cache, or even if it updates its value in the memory. Core2 needs to know that the value of this variable has been changed and it should update its value before using it. Well this is called as the "Cache Coherency" problem. The problem statement is simple, "Figure out if something has been messed up, and if yes , what??"

Before we delve into the details, we need to have some more basics. There are two ways to maintain a cache (even if there is a single cache), through which we decide that what data should be loaded into the cache and when, because the size of cache w.r.t RAM is quite less. (Just as a spec my laptop has 4 GB RAM and 3MB of cache). This is maintained as either write through cache or a write back cache. In a write through cache, the contents of the cache are written back to the memory as soon as they are updated in the cache. In a write back cache, the contents of the cache are kept in the cache even when they are changed, but when they need to be swapped out of the cache to load some other memory contents, only then they are written back into the memory.

Now, there are two ways to handle this situation :

The Software way
The Hardware way

In the software way, to implement cache coherency, it is the responsibility of the compiler , and sometimes the programmer (if he/she is programming in assembly language) to take care of cache coherency. A simple and a bit inefficient way of doing so is flush the cache or mark it as invalid, as soon as you write a shared/global variable. Well this a very inefficient way and there are many more efficient ways to do this. And these ways generally depend on the hardware too.

Now the problem with having cache coherency through Software means is, firstly, it is inefficient one might end up updating the whole cache even when only a few memory locations are changed. Secondly, and most importantly, the code which depends on software cache coherency cannot be easily ported from one machine to another, as the other machine might be having a different hardware configuration.

In the hardware way, there is special space reserved in the cache for cache coherence. The cache is in general divided into cache lines, say if a cache line consists of 100 bytes of data then whenever data would be swapped out of cache into the memory or data is swapped into the cache it is transacted into multiples of this number. One can imagine a cache as a 2D memory consisting of some number of lines. And for each cache line there is a directory entry in the special space reserved for cache coherence, this space is generally called as "cache directory". For each cache line there are 3 bits reserved in the directory which tell about the state of the data in that cache line.

The issues with hardware coherency is, first, it generates a lot of inter cache traffic, whenever a cache write occurs, that cache needs to send info about this to all the other caches so that they can update themselves. Secondly, the amount of cache memory which is usable for storing data decreases.

So which one is better ?

Well it depends on the situation, if the program has too little shared variables, then the hardware method is preferable, as the amount of inter cache traffic is manageable. Whereas if the amount of shared variables are too many and they are updated by different threads quite often, then the software method is preferred, as in this case network traffic will be too much, and it is fine to flush the whole cache at once instead of doing it one by one, by hardware method.

Now, how about a system in which we can dynamically change from software coherency to hardware coherency. This is a paper which talks about such a system, and also inspired me to write this blog.