Data Science Glossary

I am sure there will be many like me who are in the process of mastering data science skills. Some might be thinking about picking this area for learning and possible future career options.

In the last 3 years of my Data Science journey, I came across many online resources. The ones which helped me during my learning are quietly sitting in my favorites/bookmarks.  I thought it will be a good idea to start some posts to share my experience on self-taught data scientist. Its stiff learning curve specially if you don’t enjoy data also not so much interested into statistics or learning programming languages like Python or R. But it isn’t IMPOSSIBLE either. All you need is commitment to learn, experiences from people who pursued data science learning and some motivation along the way. I am still in the process of documenting my personal journey, hoping that it may help someone. On that note, till I write that post, I would like to share the discussion Can I become a self-taught data scientist? Hope it gives some idea to aspirants and may be some motivation too.

If you have already begun the journey I am sure you will be overwhelmed with various concepts and terms which one needs to be comfortably aware till you reach a your confidence level on overall data science topics. For someone like that having a data science glossary handy will definitely help revise and revisit during preparations. I came across following 2 resources and would like to share here with my readers,

And if you have fairly understood data science topics and wondering how the real process or projects can look like? The please do check out this Azure Machine Learning documentation on What is the Team Data Science Process?. I have come across many links, documents, papers on practical usage however this one is really well structured and yet simple to get the overall picture clear. In the future blog I will try to share other references appropriately.

My intent [ hopefully a new year resolution 🙂 ] is to begin sharing my thoughts on Data Science as regularly & frequently as I can. Having said that’s it for now. Stay tuned to my blog if you are interested learning my experiences with data science and related topics.


Book REVIEW : Visualize This

My data science journey started with the interest of learning different data visualization techniques. From the start of my career I had an influence of Concept Visualization. In the recent times, I even experimented Infographics into my resume. If you search on visualization on internet there will be plenty of resources around information graphics, presentations etc. I never thought there will be a book primarily on data visualization techniques (and not on the tools for graphics visualization). I picked this book from my library the moment I saw statistics in tagline. After reading I totally agree that it really is a guide as it states.


  • Chapter 1 : Telling Stories with Data – It’s an age old saying that storytelling is an art. Time and again, It’s used by several successful people in the world to express their vision in front of the world, be it social, political or business discussions. On other side data is not new, it’s been there even before computing evolved. With the growth of data, drawing insights, making decisions & conveying discussions to stakeholders will continue to become complex. To make it simple you need to present in visually impressive mode & to the point by keeping less decoding for receivers of the data. This chapters explains precisely the same in a very short & simple language.
  • Chapter 2 : Handling Data – The core is data so you need to understand how to gather it, where to look for sources, what if you have unstructured data, how will you get it in your hand with meaningful format/schema. This chapter gives you very good understanding about it. Going beyond, with a hands-on data scrapping exercise it will make sure you get familiar with real world data scattered over multiple web pages/sources into one file for analysis.
  • Chapter 3 : Choosing Tools to Visualize – As the title suggest it’s about choosing visualization methods on given data. It covers variety of tools from Out-Of-the box commercial tools to scripting & programming options. The amount references given and supported by hands on exercise is the most important part for me. It’s very easy to refer something and drop the topic from high level. But making reading try some of it always makes him/her connect with the book.
  • Chapter 4 : Visualizing Patterns over, Chapter 5 : Visualizing Proportions, Chapter 6 : Visualizing Relationships, Chapter 7 : Spotting Differences and Chapter 8 : Visualizing Spatial are at the heart of the book. With the hands on exercise in each chapter its explains how different type of objectives can be met on variety of dataset available for visualization. There many relevant examples shared in these chapters along with the instructions which you can try on your machine.
  • Chapter 9 : Designing with a Purpose – The wrap up chapter nicely explains how the acquired knowledge can be enhanced. Like any communication it’s not just your views/words, it’s also about the recipients requirements, understanding of the topics, awareness of related contexts & most importantly interest in relating with your views and words. Preparing yourself to handle such things will make you not just better but a meaningful in data visualization.

So how did this book help me !!!

  • First & super importantly it corrected my knowledge about the visualization. My knowledge was only limited to graphs & charts. It lifted me from there to basics plots to visualizing a data on to the maps. Yes by the end of the book, I was able to draw India map and do some sample analysis. I will post the learning in upcoming blog posts.
  • It helped me learn Python & R. At the beginning of the book I was struggling to write python code. Because of the book I pushed myself to learn it well. To my surprise I wrote my own python code for data mining. The data mining part is completed but I am still working on analyzing and visualizing in R. Once completed, I will try to put a blog post about the experience.
  • Sometime last year, I had taken a free course on R from online learning portals. This book just took me on to new levels for using R and I thoroughly enjoyed it. The book is full of R examples which will make you comfortable using the tool.
  • I had an opportunity to try hands-on tools like Adobe Illustrator and Tableau. As I had to rely on trial version it was limited but good enough to get the basic context.
  • The book has ~125 website references. I went through all of them reading at least the page referenced and what it had to explain in the context. About 80% of these were new websites for me, so you can imagine new learning I must have had through these references. Some of the references are about popular data sources/topics like world population, education & crime rates analysis in USA, Obama’s presidential run and analysis on debate topics, NBA games & player analysis etc. While others are around popular web sites for data science, blogs, tools, communities etc.
  • I took a reading pause on this book as I was reading another on “How to become data scientist“. Looking back it nicely complemented my data science study. Most important “Visualize This” helped me get more serious & focused about data science.

So to conclude, this book not only helped me learn Data Visualization. This became catalyst in my data science study, it’s the prime motivator behind my Python & R learning, and will remain guide in my future data visualization assignments. Thanks Nathan Yau for writing this wonderful guide.

To know more about book, you can check out at or see a short video introduction of book at

Lastly, stay tuned for my upcoming blog posts on learning used by me outside the datasets from this book.

Book Review : The definitive guide to becoming a data scientist.

I have always enjoyed working with data specially data visualization. Before leaving my job at Microsoft, I was involved in project (more than one year duration) where analyzing data was crucial for planning the list of applications & desktops for migration. I was involved in data mining, scrubbing, analyzing & reporting information to the customer and team. Back then I was enjoying my work but never thought of taking to next level. Then about 7-8 months back I decided to pursue data science. After initial browsing/searching on internet I was lost because of the overwhelming amount of data about it. I was not sure where to start, how the market & jobs look like, what are the existing skills I can use, how should go about filling the skill gaps and so on…

This book came to me just the right time (sometime in April 2015).


Without this book, I would have lost the interest but this book just brought me back on track with required focus. Journey towards data scientist is still long. From here on if I lose the way, it will only be my fault by not persisting the further practice and learning curve. But if I will be successful in pursuit of becoming Data Scientist, the major credit will go to this book  and the author “Zacharias Voulgaris”

So here is what I liked about this book a lot,

  • It not just a book it’s the first hand experiences shared on how to & what it takes to become Data Scientist.
  • First chapter beautifully explains the Big Data, Data Analyst and Data Science (Scientist) differences. Its sets the stage nicely toward exploring data science.
  • Explanation of big data with four V’s (Volume, Velocity, Variety & Veracity) gives clear understanding.
  • I personally feel, documenting mindset requirement is the toughest in any learning. Specially in today’s world where people go through various levels of stress to keep up with the competition. The chapter on mindset requirement can help immensely to assess and work-out on building required mindset to become data scientist.
  • Building further on mindset requirements the Chapters 5, 6 & 7 nicely explain Technical Qualification, Experience and most importantly Networking to build not just skills but connection to establish yourself in the scientist communities.
  • Chapter 8 explains on Software used. In Chapter 9 it builds further explaining how to keep Learning New Things (which is the most important aspect) and Tackling Problems (the real reason to be a scientist)
  • Chapter 10 is well summarized on Machine Learning, R & Statistics. The scenarios on when use which one really gives good perspectives towards looking at tools and problems at hand.
  • Chapter 11 dives into the Data Science processes. Whatever learnt in the earlier chapters will start making more sense with the way topic is written.
  • Building further in Chapter 12 it talks about specific skills required. What I liked is, it covers variety of profiles from experienced to student and gives guidance on reviewing/building required skills for the job. Here the introspection becomes easy to assess skill gaps and start thinking about learning plan.
  • Chapter 13 & 14 are nicely crafted around Where to Look for a Data Science Job, Presenting your candidature for applying jobs/work. In Chapter 15 it also talks about Freelance Track. It will help both types of people, the one who want to pursue freelancing while in job to build alternate career/income. And the others who are willing to be in full time freelancing at their will/choice.
  • In any learning case studies make them more relevant as all of us like to hear real stories and examples. Chapter 16-18 share stories of real people from junior to experienced data scientist.
  • Keeping yourself updated with the trends, tools & techniques is the super important to be good data scientist. The glossary, reference websites and offline books sections is overwhelming add-on which will ensure you have required pointers to stay on track to goodness.

I have only 1 suggestion. There is lot of text which is expected on such crucial topic and lets not expects shortcuts for that. However some good visuals can make lot of difference in keeping reader engaged and interested.

Conclusion – If you want to put your data science learning on fast track, go grab this book.