Data Science Glossary

I am sure there will be many like me who are in the process of mastering data science skills. Some might be thinking about picking this area for learning and possible future career options.

In the last 3 years of my Data Science journey, I came across many online resources. The ones which helped me during my learning are quietly sitting in my favorites/bookmarks.  I thought it will be a good idea to start some posts to share my experience on self-taught data scientist. Its stiff learning curve specially if you don’t enjoy data also not so much interested into statistics or learning programming languages like Python or R. But it isn’t IMPOSSIBLE either. All you need is commitment to learn, experiences from people who pursued data science learning and some motivation along the way. I am still in the process of documenting my personal journey, hoping that it may help someone. On that note, till I write that post, I would like to share the discussion Can I become a self-taught data scientist? Hope it gives some idea to aspirants and may be some motivation too.

If you have already begun the journey I am sure you will be overwhelmed with various concepts and terms which one needs to be comfortably aware till you reach a your confidence level on overall data science topics. For someone like that having a data science glossary handy will definitely help revise and revisit during preparations. I came across following 2 resources and would like to share here with my readers,

And if you have fairly understood data science topics and wondering how the real process or projects can look like? The please do check out this Azure Machine Learning documentation on What is the Team Data Science Process?. I have come across many links, documents, papers on practical usage however this one is really well structured and yet simple to get the overall picture clear. In the future blog I will try to share other references appropriately.

My intent [ hopefully a new year resolution 🙂 ] is to begin sharing my thoughts on Data Science as regularly & frequently as I can. Having said that’s it for now. Stay tuned to my blog if you are interested learning my experiences with data science and related topics.


Book Review : The definitive guide to becoming a data scientist.

I have always enjoyed working with data specially data visualization. Before leaving my job at Microsoft, I was involved in project (more than one year duration) where analyzing data was crucial for planning the list of applications & desktops for migration. I was involved in data mining, scrubbing, analyzing & reporting information to the customer and team. Back then I was enjoying my work but never thought of taking to next level. Then about 7-8 months back I decided to pursue data science. After initial browsing/searching on internet I was lost because of the overwhelming amount of data about it. I was not sure where to start, how the market & jobs look like, what are the existing skills I can use, how should go about filling the skill gaps and so on…

This book came to me just the right time (sometime in April 2015).


Without this book, I would have lost the interest but this book just brought me back on track with required focus. Journey towards data scientist is still long. From here on if I lose the way, it will only be my fault by not persisting the further practice and learning curve. But if I will be successful in pursuit of becoming Data Scientist, the major credit will go to this book  and the author “Zacharias Voulgaris”

So here is what I liked about this book a lot,

  • It not just a book it’s the first hand experiences shared on how to & what it takes to become Data Scientist.
  • First chapter beautifully explains the Big Data, Data Analyst and Data Science (Scientist) differences. Its sets the stage nicely toward exploring data science.
  • Explanation of big data with four V’s (Volume, Velocity, Variety & Veracity) gives clear understanding.
  • I personally feel, documenting mindset requirement is the toughest in any learning. Specially in today’s world where people go through various levels of stress to keep up with the competition. The chapter on mindset requirement can help immensely to assess and work-out on building required mindset to become data scientist.
  • Building further on mindset requirements the Chapters 5, 6 & 7 nicely explain Technical Qualification, Experience and most importantly Networking to build not just skills but connection to establish yourself in the scientist communities.
  • Chapter 8 explains on Software used. In Chapter 9 it builds further explaining how to keep Learning New Things (which is the most important aspect) and Tackling Problems (the real reason to be a scientist)
  • Chapter 10 is well summarized on Machine Learning, R & Statistics. The scenarios on when use which one really gives good perspectives towards looking at tools and problems at hand.
  • Chapter 11 dives into the Data Science processes. Whatever learnt in the earlier chapters will start making more sense with the way topic is written.
  • Building further in Chapter 12 it talks about specific skills required. What I liked is, it covers variety of profiles from experienced to student and gives guidance on reviewing/building required skills for the job. Here the introspection becomes easy to assess skill gaps and start thinking about learning plan.
  • Chapter 13 & 14 are nicely crafted around Where to Look for a Data Science Job, Presenting your candidature for applying jobs/work. In Chapter 15 it also talks about Freelance Track. It will help both types of people, the one who want to pursue freelancing while in job to build alternate career/income. And the others who are willing to be in full time freelancing at their will/choice.
  • In any learning case studies make them more relevant as all of us like to hear real stories and examples. Chapter 16-18 share stories of real people from junior to experienced data scientist.
  • Keeping yourself updated with the trends, tools & techniques is the super important to be good data scientist. The glossary, reference websites and offline books sections is overwhelming add-on which will ensure you have required pointers to stay on track to goodness.

I have only 1 suggestion. There is lot of text which is expected on such crucial topic and lets not expects shortcuts for that. However some good visuals can make lot of difference in keeping reader engaged and interested.

Conclusion – If you want to put your data science learning on fast track, go grab this book.