Gathering Data from Indonesia's Governmental Online Platforms
I-Perspective | Aug. 4, 2024, 7:35 p.m.
As someone who passionate about data, I often seek personal projects to enhance my skills. To make my projects more engaging and relevant, I realized it would be exciting to focus on data collection which happened to be within my home country Indonesia. In today’s era of big data and advanced analytics, governmental data platforms are crucial for informing policy decisions, advancing research, and supporting data science initiatives. Indonesia, with its rich cultural and socio-economic diversity, provides a wealth of data through these platforms. However, leveraging this data for analytical purposes presents several challenges.
1. Fragmented Data Sources

One of the primary challenges in gathering data from Indonesian governmental platforms is the fragmentation of data sources. There are actually some platforms that supposedly be a one portal data such as satudata, open data by province, and bps. But still various government departments and agencies maintain their own databases and online platforms, each with its own format and standards. This fragmentation can make it difficult to compile a comprehensive dataset for analysis.
2. Inconsistent Data Formats
Data across different platforms often comes in inconsistent formats. While some platforms might provide data in user-friendly formats such as CSV or JSON, others may offer data in less accessible formats like PDFs or proprietary systems. This inconsistency adds a layer of complexity to data aggregation and normalization.
3. Incomplete Data
Data completeness is another significant issue. Many governmental platforms may not update their datasets regularly or may provide only partial information. This lack of completeness can undermine the reliability of analyses and lead to skewed conclusions.
4. Data Accuracy and Validation
Ensuring the accuracy of the data is crucial. Errors in data entry, outdated information, or inconsistencies across different platforms can affect the validity of the analysis. Validating and cleaning data becomes a substantial task, often requiring extensive manual effort.
5. Limited API Access
Access to data through Application Programming Interfaces (APIs) can greatly facilitate data collection and integration. However, many Indonesian governmental platforms lack robust API support, making automated data extraction challenging. Where APIs are available, they may be poorly documented or have restrictive access limits.
6. Technical Infrastructure

Some platforms may suffer from technical issues, such as slow loading times, frequent downtime, or poor website design, which can significantly impede data extraction efforts. A key challenge I encountered was with the BMKG platform, which restricts data collection to just one month at a time, rather than allowing access to quarterly or annual datasets. This limitation makes it difficult to gather data over several years which what I needed at that time, and this is the inspiration for me to make this article. Furthermore, the lack of adequate technical support exacerbates these difficulties, complicating the data collection process even further.